首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
HORST P 《Psychometrika》1948,13(3):125-134
A battery of pencil-and-paper tests is commonly used for predicting a single criterion. If the score on each test is the number of correct answers, the composite battery score would normally be the sum of the weighted test scores, where the weights are the raw score regression weights. Knowing the reliability of each test, it is possible to alter the lengths of the tests in a manner such that the weights will all be equal. The composite battery score would then simply be the total number of items answered correctly and scoring would be greatly simplified. Such simplification is particularly desirable where the volume of testing is large. Section I of the article outlines the procedure for altering the lengths of the tests, and Section II gives a proof of the method.  相似文献   

2.
TAYLOR CW 《Psychometrika》1950,15(4):391-406
For any fixed total time of testing it is possible, through proper item-and-time allotment, to combine tests into a battery so that the multiple correlation with a pre-assigned criterion will be maximized. By holding constant the ratio of the length in number of items to the time length for each test, a set of general equations has been derived which will yield this maximum value of the multipleR and will enable one to determine, in any given case, the optimal fraction of total testing time that should be devoted to each type of test under consideration. The set of general equations is applied to a two-test-battery problem to obtain the optimal length of each type of test for one hour total testing time. If two other tests had been selected for the two-test sample problem, different subdivisions of the total time would generally occur. The manner in which the results would change when using other tests with different initial reliability, validity, and intercorrelation values is briefly presented. Some general implications of this method of battery development are also discussed.The writer is indebted to Max Woodbury for his assistance and especially to Dr. N. J. F. Van Steenberg and Dr. Anna S. Henriques, who provided valuable guidance and aid in the development of the solution to this problem. This paper is a revision of a thesis submitted in 1939 at the University of Utah in partial fulfillment of the requirements for the master's degree.  相似文献   

3.
HORST P 《Psychometrika》1949,14(2):79-88
If the lengths of the tests in a battery are altered, their intercorrelations and their validities or correlations with a criterion are also altered. Consequently, the multiple correlation of the battery with the criterion will also be altered. These changes are a function of the reliabilities of the tests. Suppose we have given from a set of experimental data (1) the time allowed for each test in the battery, (2) the reliability of each test, (3) the intercorrelations, and (4) the validities of all the tests. If we specify the over-all testing time we are willing to allow for the test in the future, we can determine the amount by which each test must be altered in order to give the maximum multiple correlation with the criterion. The method is presented, together with numerical examples and the mathematical proof.  相似文献   

4.
The Gleser-DuBois conditions for selecting from a number of test items those which will maximize the correlation between total test score and criterion will degenerate into expressions requiring only item counts on total distributions and the upper halves of distributions. A grouping convention for scores near medians is recommended. The inefficiency of the method is easily compensated for, because, regardless of the size of the sample, only standard test-scoring equipment and brief computations are required. A procedure is outlined, and some applications are discussed.  相似文献   

5.
An explicit solution is given to the problem of assigning relative lengths to the subtests of a test so as to maximize the correlation of the unit weight composite with a specified criterion when the total testing time is fixed. This solution is valid and unique whenever it specifies nonnegative times for all variables. A step-down procedure is suggested for cases in which some of the testing times are zero. This procedure does not necessarily provide an optimal allocation. However in examples studied it is found to provide near optimum results. Algorithms are also developed for the determination of the least total testing time required to attain specified multiple and composite correlations. A numerical example is given illustrating the use of the unit weight procedure in combination with the regression weight algorithm. Supported in part by the Personnel and Training Branch of the Office of Naval Research under Contrast Number 000-14-69C-0119, Melvin R. Novick, Principal Investigator. Reproduction, translation, publication, use and disposal in whole or in part by or for the United States Government is permitted.  相似文献   

6.
This paper discusses the influence of test difficulty on the correlation between test items and between tests. The greater the difference in difficulty between two test items or between two tests the smaller the maximum correlation between them. In general, the greater the number of degrees of difficulty among the items in a test or among the tests in a battery, the higher the rank of the matrix of intercorrelations; that is, differences in difficulty are represented in the factorial configuration as additional factors. The suggestion is made that if all tests included in a battery are roughly homogeneous with respect to difficulty existing hierarchies will be more clearly defined and meaningful psychological interpretation of factors more readily attained.  相似文献   

7.
This paper uses an extension of the network algorithm originally introduced by Mehta and Patel to construct exact tail probabilities for testing the general hypothesis that item responses are distributed according to the Rasch model. By assuming that item difficulties are known, the algorithm is applicable to the statistical tests either given the maximum likelihood ability estimate or conditioned on the total score. A simulation study indicates that the network algorithm is an efficient tool for computing the significance level of a person fit statistic based on test lengths of 30 items or less.  相似文献   

8.
Three data sets are analyzed that permit double cross-validation of a test battery against criterion variables in a number of educational programs or jobs. The validity of the first general factor score is compared with that obtained from the set of cross-validated regression weights, and is found to account, respectively, for approximately 85, 90 and 120 percent as much criterion variance as the cross-validated regression weights. Small further contributions appear to be made by a mechanical/technical and by a psychomotor factor. However, for a wide range of criterion variables the major role in validity appears to be played by a common general factor.  相似文献   

9.
刘玥  刘红云 《心理科学》2015,(6):1504-1512
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。  相似文献   

10.
The effect of repeated testing on delayed relearning of paired associates was investigated. Participants learned two lists of Lithuanian-Dutch word pairs until reaching the criterion of one correct recall from long-term memory. In one condition, items subsequently received three post-retrieval study trials and in the other condition items received three post-retrieval test trials. Participants returned one week later for delayed recall and relearning. Post-retrieval test trials resulted in better delayed recall performance than post-retrieval study trials. Moreover, we found that the items that were repeatedly studied or tested one week prior to relearning were relearned faster than a new set of similar (not previously presented) items. Most importantly, items were relearned faster when they had previously been learned under conditions of post-retrieval testing than items learned under conditions of post-retrieval study. Taken together, the results indicate that the benefits of repeated testing are not just limited to conscious recall on a delayed test. Repeated testing during initial learning is also a very effective strategy to enhance delayed relearning.  相似文献   

11.
This article describes the 1997 revision of the Dutch Rating System for Test Quality used by the Committee of Test Affairs of the Dutch Association of Psychologists (COTAN). The revised rating system evaluates the quality of a test on 7 criteria: Theoretical basis and the soundness of the test development procedure, Quality of the testing materials, Comprehensiveness of the manual, Norms, Reliability, Construct validity, and Criterion validity. For each criterion, a checklist with a number of items is provided. Some items (for each criterion at least 1) are so-called key questions, which check whether certain minimum conditions are met. If a key question is rated negative, the rating for that criterion will automatically be "insufficient." To enhance a uniform interpretation of the items by the raters and to explain the system to test users and test developers, comment sections provide detailed information on rating and weighting the items. Once the items have been rated, the final grades (insufficient, sufficient, or good) for the 7 criteria are established by means of weighting rules.  相似文献   

12.
The ratio of item validity to item-total correlation can be used to select items which will tend to yield the maximum correlation with a criterion. Items to be retained are identified by comparing the ratio for each item with the validity of the original test. Further improvement of the validity in the experimental sample can be obtained by adding items to or removing items from the selected nucleus, according to recomputed ratios involving the correlations of the items with the nucleus and evaluated by means of a revised cut-off point. With slight variations, the method may be used for interest and personality tests as well as for aptitude material. The principal advantage over previous methods is that for any cycle of the analysis an exact cut-off point is provided.  相似文献   

13.
Mental set is the tendency to solve certain problems in a fixed way based on previous solutions to similar problems. The moment of insight occurs when a problem cannot be solved using solution methods suggested by prior experience and the problem solver suddenly realizes that the solution requires different solution methods. Mental set and insight have often been linked together and yet no attempt thus far has systematically examined the interplay between the two. Three experiments are presented that examine the extent to which sets of noninsight and insight problems affect the subsequent solutions of insight test problems. The results indicate a subtle interplay between mental set and insight: when the set involves noninsight problems, no mental set effects are shown for the insight test problems, yet when the set involves insight problems, both facilitation and inhibition can be seen depending on the type of insight problem presented in the set. A two process model is detailed to explain these findings that combines the representational change mechanism with that of proceduralization.  相似文献   

14.
Maximum validity of a test with equivalent items   总被引:1,自引:0,他引:1  
It is assumed that a scale of true scores on a function exists and that the probability of answering an item correctly is a curve of the type of the integral of the normal curve. The product moment correlation between the test score and true score is derived for a normal distribution of subjects and a test composed of equivalent items. Numerical examples demonstrate that the maximum correlation between test scores and true scores occurs for a one hundred item test when the point correlation between items is less than three tenths.  相似文献   

15.
This article introduces new statistics for evaluating score consistency. Psychologists usually use correlations to measure the degree of linear relationship between 2 sets of scores, ignoring differences in means and standard deviations. In medicine, biology, chemistry, and physics, a more stringent criterion is often used: the extent to which scores are identically equal. For each test taker (or other unit of measurement), the difference between the 2 scores is calculated. The root mean square difference (RMSD) represents the average change from 1 set of scores to the other, and the concordance correlation coefficient (CCC) rescales this coefficient to have a maximum value of 1. This article shows the relationship of the RMSD and CCC to the intraclass correlation coefficients, product-moment correlation, and standard error of measurement. Finally, this article adapts the RMSD and the CCC for linear, consistency, and absolute definitions of agreement.  相似文献   

16.
Three rhesus monkeys were trained and tested in a same/different task with six successive sets of 70 item pairs to an 88% accuracy on each set. Their poor initial transfer performance (55% correct) with novel stimuli improved dramatically to 85% correct following daily item changes in the training stimuli. They acquired a serial-probe-recognition (SPR) task with variable (1-6) item list lengths. This SPR acquisition, although gradual, was more rapid for the monkeys than for pigeons similarly trained. Testing with a fixed list length of four items at different delays between the last list item and the probe test item revealed changes in the serial-position function: a recency effect (last items remembered well) for 0-s delay, recency and primacy effects (first and last list items remembered well) for 1-, 2-, and 10-s delays, and only a primacy effect for the longest 30-s delay. These results are compared with similar ones from pigeons and are discussed in relation to theories of memory processing.  相似文献   

17.
WhenK tests are given toN individuals, and for each individual there are two criterion measures, then (1) the multiple regression weight to be applied to the standard score for each test to predict the criterion-difference score equals the difference of the weights for predicting each criterion separately; (2) the difference between the predicted scores equals the predicted difference (each test being assigned the appropriate multiple regression weight); (3) the square of the multiple correlation between predicted and actual criterion-difference scores equals the sum of squares of the multiple correlations of the battery with each criterion less the product of these correlations and the correlation between predicted scores all divided by twice the quantity one minus the criterion intercorrelation; and (4) the variance of errors of estimating the criterion-difference score equals the sum of the variances of errors of estimating each criterion score minus twice the criterion intercorrelation, plus twice the correlation between predicted scores multiplied by the product of the square root of one minus the variance of errors of estimating one criterion and the corresponding square root for the second criterion.The author wishes to express his appreciation for the suggestions and guidance given by Dr. Harold Gulliksen in the preparation of this article. He also wishes to acknowledge the helpful comments of Dr. Paul Horst and Dr. Ledyard Tucker on certain phases of the development.  相似文献   

18.
Canonical redundancy analysis provides an estimate of the amount of shared variance between two sets of variables and provides an alternative to canonical correlation. The proof that the total redundancy is equal to the average squared multiple correlation coefficient obtained by regressing each variable in the criterion set on all variables in the predictor set is generalized to the case in which there are a larger number of criterion than predictor variables. It is then shown that the redundancy for the criterion set of variables is invariant under affine transformation of the predictor variables, but not invariant under transformation of the criterion variables.  相似文献   

19.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。  相似文献   

20.
A theoretical discussion of the factor pattern of predictor tests and criterion shows that ordinary test selection methods break down under certain circumstances. It is shown that maximal resultsmay not occur if suppressor variables are present among the predictors. Suggested solutions to the problem include: (1) prior item analysis of tests against the criterion, (2) selection of several trial batteries including some with suppressor variables on the basis of a factor analysis of tests and criterion, (3) modification of the usual test selection procedures to include separate solutions based upon each of several starting variables, or (4) the cumbersome and tedious solution of all possible combinations of predictors. The solutions are recommended in the order named above. Although all of the suggested solutions involve added labor and may not be necessary, the test or battery constructor should at least be aware of the problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号