首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is well known that coefficient alpha can be used to estimate the reliability of a test even when the test is split into several parts. It is also known that alpha can severely underestimate test reliability when the several parts have an unequal number of items. A gernalization of alpha,β k, is proposed to correct this defect. Several properties ofβ k are also presented. The author gratefully acknowledges the assistance of Dr. Leonard Feldt for reviewing an earlier draft of this paper, and Ms. Rita Karwacki Bode and Mr. Dave Mansell for the analysis of the experimental data reported here. The comments of an unknown referee which contributed substantially to the clarity of the presentation are also gratefully acknowledged.  相似文献   

2.
Abstract: It is often required to predict the scores or their variations under interest. Ishii and Watanabe (2001) investigated, in the context of psychological measurement, the Bayesian predictive distribution of a new subject’s scores for tests and subjects’ scores for a new test. In this paper, the Bayesian posterior predictive distribution of a new subject’s scores for a new parallel test were considered. And the effects of the number of subjects, the number of the tests, and the test reliability were investigated. Then, it was found that, under assumptions that (co)variance parameters are known, the predictive variance of a new subject’s score for a new test was equal to the predictive variances of the new subject’s scores for the existent tests. It was also found that the effect of the number of subjects was relatively large and the effect of the number of tests was relatively small, when a new subject’s scores for existent tests were not observed.  相似文献   

3.
4.
ABSTRACT

This study presents Danish data for the Symbol Digit Modalities Test (SDMT), Color Trails Test (CTT), and a modified Stroop test from 100 subjects aged 60–87 years. Among the included demographic variables, age had the highest impact on test performances. Thus, the study presents separate data for different age groups. For SDMT and CTT1, Danish Adult Reading Test (DART) score also had a significant impact on test performances. The incongruent version of the modified Stroop test was significantly correlated to education. Moderate and significant correlations were found between the three tests. Even though the three tests are commonly used, few normative data for elderly exists. SDMT and CTT performances from this study were in the same range as previously published international norms, but the validity of the result from the modified Stroop test could not be investigated.  相似文献   

5.
4-5年级学生的空间表征与几何能力的相关性研究   总被引:4,自引:1,他引:3  
徐凡  施建农 《心理学报》1992,25(1):22-29
本研究是“学生空间能力和几何能力关系”研究的一部分,以小学4、5年级学生为被试(共117人),以空问表征能力测验和几何能力测验为测验工具,初步探讨了学生空间表征与几何能力的关系。对数据结果的分析表明:①就总成绩而言,五年级学生的空间表在成绩明显高于四年级学生的成绩,但并不是空间表征的所有方面都存在着显著的年级差异;②就空间测验的总成绩而言,无论是四、五年级分别考察还是总起来考察,空间成绩与几何成绩之间的相关显著,但就各个分项而言,并不是空间测验的各项都与几何测验成绩有显著相关;学生的几何成绩在一定程度上可通过回归方程:Y_i=0.5736X_i+0.7635加以预测。  相似文献   

6.
BACKGROUND: Apraxia is neurologically induced deficit in the ability perform purposeful skilled movements. One of the most common forms is ideomotor apraxia (IMA) where spatial and temporal production errors are most prevalent. IMA can be associated Alzheimer's disease (AD), even early in its course, but is often not identified possibly because the evaluation of IMA by inexperienced judges using performance tests is unreliable. The purpose of this study, therefore, is to learn if the Postural Knowledge Test (PKT), a praxis discrimination test that assesses knowledge of transitive (PKT-T subtest) and intransitive (PKT-I subtest) postures and does not require extensive training, is as sensitive and specific as the praxis performance tests. METHODS: We studied 15 subjects with probable AD as well as 18 age-matched controls by having them perform transitive and intransitive gestures to command and imitation, as well as having them discriminate between correct and incorrect transitive and intransitive postures. RESULTS: Overall on all tests, the control subjects performed better than those with AD. In addition all subjects had more trouble with transitive than intransitive gestures. Using a stepwise discriminative analysis, 81.8% of the subjects could be classified according to Group (94.4% of Controls, 66.7% of AD subjects). In this analysis, the PKT-T (transitive posture subtest) was the only measure that contributed to the discrimination of subjects. CONCLUSION: We found that having subjects select the correct transitive hand postures in this "booklet test" was more sensitive than grading their praxis performances even when using judges with extensive training. This suggests that this discrimination test might be an excellent means for diagnosing and screening patients for AD. The reason why recognition of transitive postures is relatively more difficult for our AD subjects is not known. Two possibilities are that the representations for intransitive movements are stronger than those for transitive movements, and hence, more resistant to degradation, or that intransitive acts are stored in parts of the brain not affected by AD.  相似文献   

7.
In some popular test designs (including computerized adaptive testing and multistage testing), many item pairs are not administered to any test takers, which may result in some complications during dimensionality analyses. In this paper, a modified DETECT index is proposed in order to perform dimensionality analyses for response data from such designs. It is proven in this paper that under certain conditions, the modified DETECT can successfully find the dimensionality-based partition of items. Furthermore, the modified DETECT index is decomposed into two parts, which can serve as indices of the reliability of results from the DETECT procedure when response data are judged to be multidimensional. A simulation study shows that the modified DETECT can successfully recover the dimensional structure of response data under reasonable specifications. Finally, the modified DETECT procedure is applied to real response data from two-stage tests to demonstrate how to utilize these indices and interpret their values in dimensionality analyses.  相似文献   

8.
Abstract.— Subjects learned a pictorial material in anticipation of either free recall (FR), serial recall (SR), or recognition tests. A design containing all possible combinations of anticipated test and test actually given was used. SR and recognition performance was best when subjects anticipated these tests, respectively, whereas FR performance was best when an SR test was anticipated. Anticipation of recognition tended to interfere with SR performance, and vice versa. The results indicate that subjects encode pictorial material differently in anticipation of different retention tests; that this serves to facilitate or to impair performance on the anticipated and/or other retention tests in a predictable manner; and that subjects tend to use different information from the stimuli to pass recognition tests and to pass recall (FR or SR) tests.  相似文献   

9.
In two experiments with categorized lists, we asked whether the testing effect in free recall is related to enhancements in organizational processing. During a first phase in Experiment 1, subjects studied one list over eight consecutive trials, they studied another list six times while taking two interspersed recall tests, and they learned a third list in four alternating study and test trials. On a test 2 days later, recall was directly related to the number of tests and inversely related to the number of study trials. In addition, increased testing enhanced both the number of categories accessed and the number of items recalled from within those categories. One measure of organization also increased with the number of tests. In a second experiment, different groups of subjects studied a list either once or twice before a final criterial test, or they studied the list once and took an initial recall test before the final test. Prior testing again enhanced recall, relative to studying on the final test a day later, and also improved category clustering. The results suggest that the benefit of testing in free recall learning arises because testing creates retrieval schemas that guide recall.  相似文献   

10.
A long‐lasting assumption about the framing effect is that if the participants discover the purpose of the experiment in a within‐subject design, then this test transparency would trigger them to override their initial answer and make coherent choices. For this reason, researchers try to mask the connection between the two parts of the test by inserting filling questions or a time delay between the two parts of the test. In this research, we explored the extent to which these customarily used masking solutions are effective in increasing test sensitivity for the framing effect. In three experiments, we assessed the effect of masking on the tests of the attribute framing and the risky‐choice framing effects. Contradicting the general belief, our results indicate that these effects are already measurable without any masking or delay and we found no convincing evidence that the attempts to decrease task transparency provide worthwhile benefits for general tests of the effect. Beyond their practical relevance, the results question whether the test is a good measure of coherence rationality and better suit those accounts that suggest that the two parts of the framing tasks cannot be regarded as identical. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

11.
To assess the reliability of congeneric tests, specifically designed reliability measures have been proposed. This paper emphasizes that such measures rely on a unidimensionality hypothesis, which can neither be confirmed nor rejected when there are only three test parts, and will invariably be rejected when there are more than three test parts. Jackson and Agunwamba's (1977) greatest lower bound to reliability is proposed instead. Although this bound has a reputation for overestimating the population value when the sample size is small, this is no reason to prefer the unidimensionality-based reliability. Firstly, the sampling bias problem of the glb does not play a role when the number of test parts is small, as is often the case with congeneric measures. Secondly, glb and unidimensionality based reliability are often equal when there are three test parts, and when there are more test parts, their numerical values are still very similar. To the extent that the bias problem of the greatest lower bound does play a role, unidimensionality-based reliability is equally affected. Although unidimensionality and reliability are often thought of as unrelated, this paper shows that, from at least two perspectives, they act as antagonistic concepts. A measure, based on the same framework that led to the greatest lower bound, is discussed for assessing how close is a set of variables to unidimensionality. It is the percentage of common variance that can be explained by a single factor. An empirical example is given to demonstrate the main points of the paper. The authors are obliged to Henk Kiers for commenting on a previous version. Gregor Sočan is now at the University of Ljubljana.  相似文献   

12.
Abstract

Contrary to conventional educational testing, in so-called dynamic assessment subjects are allowed to consult help during testing or are offered prior training. The differential results of both testing procedures are sometimes ascribed to the idea that dynamic tests reflect the breadth of the zone of proximal development on top of independent achievement. Alternative explanations claim that conventional tests are more strongly biased towards various characteristics of persons, which have a negative influence on performance, when compared to dynamic tests. In this study, it was hypothesised that static as well as dynamic assessment is biased towards anxious tendencies of subjects, but the former more strongly than the latter. In order to investigate this supposition, the performance of subjects on dynamic and static tests was systematically compared and related to measures of test anxiety in a longitudinal experiment. In the experiment, repeated measures of independent mathematics achievement as well as mathematics learning potential were gathered among students of secondary education in the Netherlands. Prior to every mathematics test, subjects filled out a test anxiety questionnaire. After every mathematics test, subjects filled out a general state anxiety questionnaire. The participating subjects were students from secondary education, either preparing for higher vocational training or university, aged approximately 15 years on average.

The results of the experiment showed that lack of self-confidence is an important constituent factor of test anxiety, apart from worry and emotionality. The data supported the assumption that such testing procedures are less biased towards anxiety than conventional tests, but it was not established that dynamic testing procedures render results that are not biased by test anxious tendencies.  相似文献   

13.
In some situations where reliability must be estimated it is impossible to divide the measuring instrument into more than two separately scoreable parts. When this is the case, the parts may be homogeneous in content but clearly unequal in length. The resultant scores will not be essentially τ-equivalent, and hence total test reliability cannot be satisfactorily estimated via Cronbach's coefficient alpha. Limitation on the number of parts rules out Kristof's three-part approach. A technique is developed for estimating reliability in such situations. The approach is shown to function very well when applied to five achievement tests.  相似文献   

14.
The purpose of this study was to assess retrieval strategy in incidental, intentional, and inclusion tests with word-fragment cues following a levels-of-processing manipulation at study. The results of Exp. 1 showed small levels-of-processing effects in incidental tests, and most subjects reported involuntary rather than voluntary retrieval of study-list words. In an intentional test, although levels of processing had a much greater effect, quite a few subjects also reported involuntary rather than voluntary retrieval of study-list words, and these subjects showed a smaller effect of levels of processing than subjects reporting voluntary retrieval. These results suggest that subjects given instructions for both voluntary and involuntary retrieval of study-list words in an inclusion test might not in fact attempt voluntary retrieval at all, but simply adopt an involuntary retrieval strategy. The results of Exp. 2 provided evidence to support this suggestion. The general implication is that where test contamination refers to subjects' failure to use retrieval strategies in accordance with test instructions, inclusion tests can be contaminated, as well as incidental or intentional tests, and that it is always necessary to obtain converging evidence about the actual strategies subjects use.  相似文献   

15.
Factorial results are affected by selection of subjects and by selection of tests. It is shown that the addition of one or more tests which are linear combinations of tests already in a battery causes the addition of one or more incidental factors. If the given test battery reveals a simple structure, the addition of tests which are linear combinations of the given tests leaves the structure unaffected unless the number of incidental factors is so large that the common factors become indeterminate.  相似文献   

16.
Two studies examined situational determinants of choice among anagram tests that varied both in difficulty and in diagnosticity (the information they provided about one's own ability). In both studies, subjects worked on a preliminary anagram test before making their choices. Study 1 manipulated level of performance on the preliminary test. Results showed that high performance led to preferring more difficult and more diagnostic tests. In Study 2, subjects were either paid or not paid for their performance on the preliminary test. Results showed that pay led to a preference for more diagnostic tests. Unexpectedly, results of both studies showed that although difficulty and diagnosticity were defined independently of one another, they were not perceived as such. Thus, high diagnostic tests were perceived as more difficult; more difficult tests were perceived as more diagnostic; and the difference between high and low diagnostic tests in perceived diagnosticity and choice of items (high diagnostic tests had higher scores on both measures) were more pronounced among more difficult tests. Motivational as well as cognitive interpretations of the results were discussed.  相似文献   

17.
18.
We propose a simple modification of Hochberg's step‐up Bonferroni procedure for multiple tests of significance. The proposed procedure is always more powerful than Hochberg's procedure for more than two tests, and is more powerful than Hommel's procedure for three and four tests. A numerical analysis of the new procedure indicates that its Type I error is controlled under independence of the test statistics, at a level equal to or just below the nominal Type I error. Examination of various non‐null configurations of hypotheses shows that the modified procedure has a power advantage over Hochberg's procedure which increases in relationship to the number of false hypotheses.  相似文献   

19.
20.
This study investigates the relationship between a number of measures of speed of cognitive information-processing and intelligence test scores. One hundred university students were given five tests of speed-of-processing, measuring their speed of encoding, short-term memory scanning, long-term memory retrieval, efficiency of short-term memory storage and processing, and simple and choice reaction time or decision-making speed. They were also given the Wechsler Adult Intelligence Scale and the Raven Advanced Progression Matrices. A number of multiple regression analyses show that the cognitive processing measures are significantly related to IQ scores. Other analyses indicate that this relationship cannot be attributed to the common content shared by the reaction time and the intelligence tests, nor to the fact that parts of the WAIS are timed. It is concluded that the reaction time tests measure basic cognitive operations which are involved in many forms of intellectual behavior, and that individual differences in intelligence can be attributed, to a moderate extent, to variance in the speed or efficiency with which individuals can execute these operations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号