首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
SIGNIFICANCE TESTS HAVE THEIR PLACE   总被引:1,自引:0,他引:1  
Abstract— Null-hypothesis significance tests (NHST), properly used, tell us whether we have sufficient evidence to be confident of the sign of the population effect—but only if we abandon two-valued logic in favor of Kaiser's (1960) three-alternative hypothesis tests Confidence intervals provide a useful addition to NHSTs, and can be used to provide the same sign-determination function as NHST However, when so used, confidence intervals are subject to exactly the same Type I, II, and III error rates as NHST In addition, NHSTs provide two pieces of information about our data—maximum probability of a Type III error and probability of a successful exact replication—that confidence intervals do not The proposed alternative to NHST is just as susceptible to misinterpretation as is NHST The problem of bias due to censoring of data collection or publication can be handled by providing archives for all methodologically sound data sets, but reserving interpretations and conclusions for statistically significant results.  相似文献   

2.
Although dogs are almost totally incapable of symbolic behaviour, they can hope, for a dog's behaviour can manifest not only a desire for something but varying degrees of expectation that it will get what it desires; but since they are almost totally incapable of symbolic behaviour, nothing they do can indicate that they both desire something and yet are certain that they will not get it. So the suggestion that dogs entertain idle wishes is, apparently, vacuous, i.e. untestable, or nonsensical. Nonetheless, we can imagine situations in which we would be tempted to say of a dog that it had an idle wish, but since idle wishes so often and typically require language, we should be reluctant to impute it.  相似文献   

3.
An attempt has been made in this paper to show that culture fair tests have some problems associated with them. These tests should be examined and reviewed closely before being used and should not be regarded as the answer to testing the culturally disadvantaged. The following points were made in this paper: Culture fair tests measure different psychological functions. Culture fair tests today measure such functions as spatial visualization, abstract reasoning, perceptual speed, etc. Culture fair tests vary considerably in format. Some are pencil and paper tests, some are performance tests. Some use verbal instructions, others do not. There are many test parameters along which culture fair tests now vary. Some evidence suggests that culture fair tests possibly increase the differential between the culturally disadvantaged and the more advantaged population. Use of these tests may not be in the best interests of minority groups. It is not yet definite about the kind of items on which culturally disadvantaged people perform poorer. Some evidence suggests that they do better on verbal items and worst on perceptual items, which is in contrast to the assumption of most proponents of culture-fair tests. The validity of culture fair tests has not been shown to be better than more traditional tests. In contrast, some research even indicates that they do not show relationships as high. What is to be done, if anything, about the test differentials between the culturally disadvantaged and the majority population? Some individuals (Lorge, 1964; Coffman, 1964) agree that the elimination of group differences on tests is futile and argue that the real task at hand is a realistic attempt to study the behavioral significance of test differences. In essence, this is an all out attempt to collect validation information. Does a particular score for a black have the same behavioral implications of a higher (or lower) test score for a white? Are there criterion differences that are related to test differences? Do differential validities exist for various subgroups? Are the standard errors of estimates different for different groups? This approach is essentially what has been pursued by individuals investigating the “moderating” effects of subgrouping by race and/or socio-economic factors. The investigation of test differences within and between subgroups is called for. In my opinion, attempting to mask test differentials by using culture fair tests may in actuality have a reverse effect than what was intended. Test differentials may actually increase, making it more difficult for culturally disadvantaged individuals to be selected into schools, jobs, etc. Clearly, the construction of culture fair tests is not the only answer to testing the disadvantaged.  相似文献   

4.
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non‐normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann–Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann–Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann–Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann–Welch tests, and the power of the Schuirmann–Yuen was substantially greater than that of the Schuirmann or Schuirmann–Welch tests when distributions were skewed or outliers were present. The Schuirmann–Yuen test is recommended for assessing clinical significance with normative comparisons.  相似文献   

5.
Frender and Doubilet suggest that Bousfield's ratio of repetitions (RR) is the best measure of clustering in free recall presently available. Conditioning only on the number of words recalled, they determine the mean ofRR in the absence of clustering. In this note the null variance ofRR is presented. This permits development of conservative significance tests based on the Cramér inequality.  相似文献   

6.
Previous research has shown that age and education have a significant effect on neuropsychological test scores among normals, but that these effects are sharply diminished, or perhaps totally obliterated, among adults with brain damage and children with brain damage. These findings would have their major practical clinical significance in limiting the use of age and/or education adjustments of raw scores for subjects with brain damage, especially if the adjustments were based on data derived from the study of normal subjects. The present investigation studied the effects of age and education on the Neuropsychological Deficit Scale (NDS) for Older Children scores of children with learning disabilities, aged 9 through 14 years. No significant effects among age and education variables and NDS scores were found. In fact, younger and older subgroups as well as lower- and higher-educated subgroups earned mean NDS scores that were not significantly different. It appears that the neuropsychological consequences of learning disabilities override the effects of age and education in the 9- through 14-year age range.  相似文献   

7.
An important application of psychological principles involves increasing intentions to engage in activities that, although admittedly beneficial, are often not initially appealing (e.g., studying, quitting smoking, dieting). The present study tests the utility of directed thinking as a tool for eliciting intentions to engage in such activities. Undergraduate students were directed to think either about the reasons why people should find studying enjoyable or about the actions that people might take to make studying enjoyable. Regardless of whether they thought as individuals or in cooperating dyads, students who thought about actions later reported greater intentions to spend time studying than did students who thought about reasons. The results have both theoretical and practical significance.  相似文献   

8.
Randomization tests are a class of nonparametric statistics that determine the significance of treatment effects. Unlike parametric statistics, randomization tests do not assume a random sample, or make any of the distributional assumptions that often preclude statistical inferences about single‐case data. A feature that randomization tests share with parametric statistics, however, is the derivation of a p‐value. P‐values are notoriously misinterpreted and are partly responsible for the putative “replication crisis.” Behavior analysts might question the utility of adding such a controversial index of statistical significance to their methods, so it is the aim of this paper to describe the randomization test logic and its potentially beneficial consequences. In doing so, this paper will: (1) address the replication crisis as a behavior analyst views it, (2) differentiate the problematic p‐values of parametric statistics from the, arguably, more useful p‐values of randomization tests, and (3) review the logic of randomization tests and their unique fit within the behavior analytic tradition of studying behavioral processes that cut across species.  相似文献   

9.
The Semantic Verification Test (SVT) consists of brief, simple statements, which are either true or false, about various arrangements of the letters ABC (e.g. B after A). Mean reaction time (RT) for confirming or disconfirming the various statements varies according to their complexity. In independent studies of university students and Navy recruits, RT and other response latency parameters (intraindividual variability and movement time) from SVT performance show significant correlations of about -0.40 with nonspeeded tests of psychometric g. The mean RTs of adults to the various SVT item types are highly related to the mean error rates on these item types when the SVT is taken by elementary school children as a nonspeeded paper-and-pencil test. RT is correlated with the general cognitive ability factor (g) and not with the test-taking speed factor that is found in speeded paper-and-pencil tests. The degree of correlation between RT and psychometric g does not show any regular relationship to differences in the SVT item-type's complexity or difficulty as indicated by mean RT.  相似文献   

10.
Dreyfus and Rubin's commentary on Division II of Being and Time raises three closely related puzzles about the possibility of authenticity: (i) how could Dasein ever choose to become authentic, (ii) how could authentic Dasein ever choose to take up any particular possibility, and (iii) how could anything matter to authentic Dasein? They argue that Heidegger has a convincing answer to the first two puzzles, but they find his answer to the third “indirect and not totally convincing” (D&;R, p.?332). I argue that they should find Heidegger's answer to the third puzzle far worse than “not totally convincing”, given their interpretation of his account of anxiety, and that the answers they claim he has in response to the first two puzzles are not supported by the text. I then show that the puzzles arise from distortions in Dreyfus and Rubin's interpretation of Heidegger's account of anxiety. The puzzles dissolve once the distortions are identified.  相似文献   

11.
On the relationship between autobiographical memory and perceptual learning   总被引:35,自引:0,他引:35  
Although the majority of research on human memory has concentrated on a person's ability to recall or recognize items as having been presented in a particular situation, the effects of memory are also revealed in a person's performance of a perceptual task. Prior experience with material can make that material more easily identified or comprehended in perceptually difficult situations. Unlike with standard retention tests, effects of prior experience on a perceptual task do not logically require that a person be aware that he or she is remembering. Indeed, amnesic patients purportedly show effects of practice in their subsequent performance of a perceptual or motor task even though they profess that they do not remember having engaged in that prior experience. The experiments that are reported were designed to explore the relationship between the more aware autobiographical form of memory that is measured by a recognition memory test and the less aware form of memory that is expressed in perceptual learning. Comparisons of effects on perceptual learning and recognition memory reveal two classes of variables. Variables such as the level of processing of words during study influenced recognition memory, although they had no effect on subsequent perceptual recognition. A study presentation of a word had as large an effect on its later perceptual recognition when recognition memory performance was very poor as it did when recognition memory performance was near perfect. In contrast, variables such as the number and the spacing of repetitions produced parallel effects on perceptual recognition and recognition memory. Following Mandler and others, it is suggested that there are two bases for recognition memory. If an item is readily perceived so that it seems to "jump out" from the page, a person is likely to judge that he or she has previously seen the item in the experimental situation. Variables that influence ease of perceptual recognition, then, can also have an effect on recognition memory, so parallel effects are found. The second basis for recognition memory involves elaboration of a word's study context and depends on such factors as level of processing during study--factors that are not important for perceptual recognition of isolated words. Comparisons of perceptual recognition and recognition memory are shown to be useful for determining how a variable has its effect. Effects of study on perceptual recognition appear to be totally due to memory for physical or graphemic information. Results reported are also relevant to theories of perceptual learning. A single presentation of an item is shown to have large and long-lasting effects on its later perceptual recognition. At least partially, effects of study on perceptual recognition depend on the same variables as do effects on more standard memory tests.  相似文献   

12.
13.
This article presents a critique of the concept of randomness as it occurs in the psychological literature. The first section of our article outlines the significance of a concept of randomness to the process of induction; we need to distinguish random and non-random events in order to perceive lawful regularities and formulate theories concerning events in the world. Next we evaluate the psychological research that has suggested that human concepts of randomness are not normative. We argue that, because the tasks set to experimental subjects are logically problematic, observed biases may be an artifact of the experimental situation and that even if such biases do generalise they may not have pejorative implications for induction in the real world. Thirdly we investigate the statistical methodology utilised in tests for randomness and find it riddled with paradox. In a fourth section we find various branches of scientific endeavour that are stymied by the problems posed by randomness. Finally we briefly mention the social significance of randomness and conclude by arguing that such a fundamental concept merits and requires more serious considerations.  相似文献   

14.
Mediation is said to occur when a causal effect of some variable X on an outcome Y is explained by some intervening variable M. The authors recommend that with small to moderate samples, bootstrap methods (B. Efron & R. Tibshirani, 1993) be used to assess mediation. Bootstrap tests are powerful because they detect that the sampling distribution of the mediated effect is skewed away from 0. They argue that R. M. Baron and D. A. Kenny's (1986) recommendation of first testing the X --> Y association for statistical significance should not be a requirement when there is a priori belief that the effect size is small or suppression is a possibility. Empirical examples and computer setups for bootstrap analyses are provided.  相似文献   

15.
Study objectives were to clarify children’s gender-based implicit and explicit mathematics and reading stereotypes, and to determine if implicit and explicit measures were related or represented distinct constructs. One hundred and fifty-six boys and girls (mean age 11.3 years) from six elementary schools completed math or reading stereotype measures. Results for the implicit measures showed that children believed their own gender was superior in mathematics ability, and that girls but not boys believed that girls were better in reading. Explicit measures revealed that girls but not boys believed they were superior at math, and that girls and boys believed girls were better readers than boys. Implicit and explicit measures were not related. Results are discussed in relation to previous studies on children’s mathematics and reading gender stereotypes and large scale tests of mathematics and reading achievement. Educational and research implications are discussed.  相似文献   

16.
Significance tests are not the only step in statistics. Other considerations include effect sizes and adequate sample sizes for a respectable level of statistical power. However, many statistical packages are spotty in their offerings of effect size, complex, and lack a friendly interface. Textbooks may have limited coverage, and calculations entail several formulas and tables. Power & Effect offers a calculator- and formula-based metaphor to compute popular measures of effect size, simple significance tests between effect sizes, combining of effect sizes, simple significance tests based on known statistical values, and sample size determinations based on predicted results or effect size.  相似文献   

17.
Two studies examined situational determinants of choice among anagram tests that varied both in difficulty and in diagnosticity (the information they provided about one's own ability). In both studies, subjects worked on a preliminary anagram test before making their choices. Study 1 manipulated level of performance on the preliminary test. Results showed that high performance led to preferring more difficult and more diagnostic tests. In Study 2, subjects were either paid or not paid for their performance on the preliminary test. Results showed that pay led to a preference for more diagnostic tests. Unexpectedly, results of both studies showed that although difficulty and diagnosticity were defined independently of one another, they were not perceived as such. Thus, high diagnostic tests were perceived as more difficult; more difficult tests were perceived as more diagnostic; and the difference between high and low diagnostic tests in perceived diagnosticity and choice of items (high diagnostic tests had higher scores on both measures) were more pronounced among more difficult tests. Motivational as well as cognitive interpretations of the results were discussed.  相似文献   

18.
Preliminary tests of equality of variances used before a test of location are no longer widely recommended by statisticians, although they persist in some textbooks and software packages. The present study extends the findings of previous studies and provides further reasons for discontinuing the use of preliminary tests. The study found Type I error rates of a two‐stage procedure, consisting of a preliminary Levene test on samples of different sizes with unequal variances, followed by either a Student pooled‐variances t test or a Welch separate‐variances t test. Simulations disclosed that the twostage procedure fails to protect the significance level and usually makes the situation worse. Earlier studies have shown that preliminary tests often adversely affect the size of the test, and also that the Welch test is superior to the t test when variances are unequal. The present simulations reveal that changes in Type I error rates are greater when sample sizes are smaller, when the difference in variances is slight rather than extreme, and when the significance level is more stringent. Furthermore, the validity of the Welch test deteriorates if it is used only on those occasions where a preliminary test indicates it is needed. Optimum protection is assured by using a separate‐variances test unconditionally whenever sample sizes are unequal.  相似文献   

19.
Many empirical studies measure psychometric functions (curves describing how observers’ performance varies with stimulus magnitude) because these functions capture the effects of experimental conditions. To assess these effects, parametric curves are often fitted to the data and comparisons are carried out by testing for equality of mean parameter estimates across conditions. This approach is parametric and, thus, vulnerable to violations of the implied assumptions. Furthermore, testing for equality of means of parameters may be misleading: Psychometric functions may vary meaningfully across conditions on an observer-by-observer basis with no effect on the mean values of the estimated parameters. Alternative approaches to assess equality of psychometric functions per se are thus needed. This paper compares three nonparametric tests that are applicable in all situations of interest: The existing generalized Mantel–Haenszel test, a generalization of the Berry–Mielke test that was developed here, and a split variant of the generalized Mantel–Haenszel test also developed here. Their statistical properties (accuracy and power) are studied via simulation and the results show that all tests are indistinguishable as to accuracy but they differ non-uniformly as to power. Empirical use of the tests is illustrated via analyses of published data sets and practical recommendations are given. The computer code in matlab and R to conduct these tests is available as Electronic Supplemental Material.  相似文献   

20.
Frank Restle 《Psychometrika》1961,26(3):291-306
A theory of cue learning, which gives rise to a system of recurrent events in Feller's sense, is analyzed mathematically. The distribution of total errors and sampling distribution of mean errors are derived, and the learning curve is investigated. Maximum likelihood estimates of parameters and sampling variances of those estimates are derived. Likelihood ratio tests of the usual null hypotheses and approximate tests of goodness of fit of substantive hypotheses are developed. The distinguishing characteristic of these tests is that they are concerned with meaningful parameters of the learning process.This research was facilitated by the writer's tenure as Faculty Research Fellow, Social Science Research Council, 1959–1961.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号