首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
WhenK tests are given toN individuals, and for each individual there are two criterion measures, then (1) the multiple regression weight to be applied to the standard score for each test to predict the criterion-difference score equals the difference of the weights for predicting each criterion separately; (2) the difference between the predicted scores equals the predicted difference (each test being assigned the appropriate multiple regression weight); (3) the square of the multiple correlation between predicted and actual criterion-difference scores equals the sum of squares of the multiple correlations of the battery with each criterion less the product of these correlations and the correlation between predicted scores all divided by twice the quantity one minus the criterion intercorrelation; and (4) the variance of errors of estimating the criterion-difference score equals the sum of the variances of errors of estimating each criterion score minus twice the criterion intercorrelation, plus twice the correlation between predicted scores multiplied by the product of the square root of one minus the variance of errors of estimating one criterion and the corresponding square root for the second criterion.The author wishes to express his appreciation for the suggestions and guidance given by Dr. Harold Gulliksen in the preparation of this article. He also wishes to acknowledge the helpful comments of Dr. Paul Horst and Dr. Ledyard Tucker on certain phases of the development.  相似文献   

2.
Hulin, Henry, and Noon (1990) reviewed evidence from a number of studies which supported, in their view, the position that predictive validities decreased over time. If correct, their results would have significant implications for personnel selection practice and research. However, further analysis of their evidence suggested that their results may have only limited generalizability. More specifically, few of the studies they used to support their claim of decreasing predictive validities were field studies of prediction-criterion pairs. Furthermore^ reported data on lagged intercorrelations were of limited relevance to the question of decreasing validities. Finally, a large body of data relevant to the issue of time-lagged validities in a personnel selection context were omitted because the data did not meet Hulin et al.'s restrictive criteria.  相似文献   

3.
TAYLOR CW 《Psychometrika》1950,15(4):391-406
For any fixed total time of testing it is possible, through proper item-and-time allotment, to combine tests into a battery so that the multiple correlation with a pre-assigned criterion will be maximized. By holding constant the ratio of the length in number of items to the time length for each test, a set of general equations has been derived which will yield this maximum value of the multipleR and will enable one to determine, in any given case, the optimal fraction of total testing time that should be devoted to each type of test under consideration. The set of general equations is applied to a two-test-battery problem to obtain the optimal length of each type of test for one hour total testing time. If two other tests had been selected for the two-test sample problem, different subdivisions of the total time would generally occur. The manner in which the results would change when using other tests with different initial reliability, validity, and intercorrelation values is briefly presented. Some general implications of this method of battery development are also discussed.The writer is indebted to Max Woodbury for his assistance and especially to Dr. N. J. F. Van Steenberg and Dr. Anna S. Henriques, who provided valuable guidance and aid in the development of the solution to this problem. This paper is a revision of a thesis submitted in 1939 at the University of Utah in partial fulfillment of the requirements for the master's degree.  相似文献   

4.
Although most studies of criterion-related validity focus on univariate relationships, the complex and multidimensional nature of the performance construct and the widespread use of multiple selection devices argue in favor of multivariate frameworks for evaluating validity. Using a Monte Carlo simulation we estimated the validity of general cognitive ability tests and personality tests in predicting "job performance," where performance is conceptualized as a composite of multiple performance measures (i.e., individual job task performance and organizational citizenship behaviors). The validity of a selection battery varies substantially as a function of the relative weight given to both predictors and criteria; the 95% confidence interval for validities ranged from .20 to .78. The effective weights given to performance dimensions accounted for 34% of the variance in selection battery validities; depending on precisely how "performance" is defined, the same test battery can have relatively high or relatively low levels of validity. Our model suggests that the way an organization defines job performance is a source of true and important variability in validities, and that the validity of selection tests for predicting complex performance criteria may show considerably less generalizability that current metaanalysis of univariate validities would suggest.  相似文献   

5.
Controversy abounds over attributing group differences on tests to nature, nurture, or test bias. Limitations of correlational sampling from natural populations necessitate experimental methods to resolve underlying issues. In classical psychometrics test items are selected from a larger item pool through analysis of item responses in a sample of subjects. Rats of six inbred strains (n = 366) were tested in multiple mazes to provide a large item pool. Six populations were created, each with differing proportions of each strain. Items selected through independent item analyses within each population yielded six tests. An independent cross-validation sample (n = 146) provided scores on all six tests. This sample was also tested in another set of maze problems defined as the criterion to be predicted. Strain means and intrastrain predictive validities for the six tests varied with strain representation in the population used for item selection (p less than .001). Conventional item-selection procedures clearly produced two forms of minority test bias.  相似文献   

6.
This paper discusses the influence of test difficulty on the correlation between test items and between tests. The greater the difference in difficulty between two test items or between two tests the smaller the maximum correlation between them. In general, the greater the number of degrees of difficulty among the items in a test or among the tests in a battery, the higher the rank of the matrix of intercorrelations; that is, differences in difficulty are represented in the factorial configuration as additional factors. The suggestion is made that if all tests included in a battery are roughly homogeneous with respect to difficulty existing hierarchies will be more clearly defined and meaningful psychological interpretation of factors more readily attained.  相似文献   

7.
It is usually assumed that adding more valid predictors will increase the predictive power of a selection test battery. Sackett, Dahlke, Shewach, and Kuncel showed that when selection tests are combined using unit weights, adding a valid predictor can lead to a decrease in validity. Situating the Sackett et al. approach in a more general multivariate framework I show how: (a) it is the tradeoff between predictor validity and predictor intercorrelations, and not the differences in predictor validities that determines whether adding a valid predictor to a composite will cause the validity of that composite to increase or decrease; and (b) this same dynamic applies across a wide range of non‐optimal schemes for weighting predictors and/or criteria.  相似文献   

8.
The purpose of the study was to develop a battery of tests for use in evaluation of intra- and intersensory development of young children. A battery of 15 tests (4 visual, 4 auditory, 4 tactile-kinesthetic, and 3 intersensory) was administered to 109 normally developing 6- and 8-year-old and 32 slowly developing or learning disabled children. Interdependence of test items within each intrasensory and the intersensory category was determined; intercorrelations ranged from .00 to .78. Reliability estimates were also determined. Face validity was claimed for each item. The effects of age or developmental level on test performance were established. Based upon the interdependence of the tests, reliability estimates, and the capacity of the tests to discriminate among groups classified according to age or developmental level, a battery of 10 intra- and intersensory tests was proposed. The battery has 3 tests of visual perception-visual memory, dynamic depth perception, and size discrimination; 3 tests of auditory perception-auditory discrimination, auditory memory of related syllables, and auditory sequential memory of numbers; 2 tests of tactile-kinesthetic perception-tactile integration and movement awareness; and 2 tests of intersensory integration-auditory-tactile intergration and auditory-visual integration.  相似文献   

9.
HORST P 《Psychometrika》1948,13(3):125-134
A battery of pencil-and-paper tests is commonly used for predicting a single criterion. If the score on each test is the number of correct answers, the composite battery score would normally be the sum of the weighted test scores, where the weights are the raw score regression weights. Knowing the reliability of each test, it is possible to alter the lengths of the tests in a manner such that the weights will all be equal. The composite battery score would then simply be the total number of items answered correctly and scoring would be greatly simplified. Such simplification is particularly desirable where the volume of testing is large. Section I of the article outlines the procedure for altering the lengths of the tests, and Section II gives a proof of the method.  相似文献   

10.
HORST P 《Psychometrika》1951,16(2):189-202
Having given a fixed amount of total testing time it is important to know how long each test in the battery should be so that the correlation of the battery with the criterion will be a maximum. The precise solution for the test lengths will depend on a particular set of conditions which may be specified. The writer has previously presented solutions for two sets of conditions. This article presents the solution for a third set of conditions. These are: (1) The total number of items or testing time is fixed. (2) The score is the total number of items correctly answered. (3) The test lengths are determined in such a way that the correlation of total score with the criterion is a maximum. The solutions for the two previous sets of conditions, together with the current set, are summarized. A set of experimental data is submitted to each solution and the three sets of results are compared.  相似文献   

11.
Significant job-relatedness was found for a posttraining job knowledge test criterion using an application of Lawshe's content validity method. The aide test was used as a criterion to assess the predictive validity of a vocabulary test and a civil service test with samples of black ( N = 43) and white ( N = 62) psychiatric aides. Significant validities were found on both tests, but a vocabulary test proved to be the better predictor of the criterion in both samples. The obtained validities were discussed in terms of differential validity, test fairness, and sample size. This study demonstrated that a content validity method could be applied to criteria as well as selection tests. It was concluded that content validity methods may be able to help solve the problem of criterion relevance in validation research by providing quantitative evidence of the job-relatedness of criteria.  相似文献   

12.
A study was conducted to investigate the nexus of cognitive and psychomotor tests as might be used for personnel selection and assessment. These two domains are frequently seen as independent. A multiple aptitude cognitive test battery and a psychomotor test battery were administered to 354 United States Air Force recruits. The average multiple correlation of the cognitive tests and each psychomotor score as a criterion was 0.34, corrected for range restriction. Confirmatory factor analyses disclosed general cognitive and general psychomotor factors, three lower-order psychomotor factors, and two lower-order cognitive factors. The general cognitive factor accounted for 39% of the variance and the general psychomotor factor accounted for 29% of the variance. Residualized, the lower-order factors accounted for between 10% and 3% of the variance. The average g saturations (loadings) of the cognitive and psychomotor tests were 0.82 and 0.34 respectively. An implication for personnel selection is that the incremental validity of psychomotor tracking tests beyond the validity of cognitive tests will be small due to the commonality of measurement. A further implication of findings is the need to study the validity of the general and specific psychomotor factors.  相似文献   

13.
LONG WF  BURR IW 《Psychometrika》1949,14(2):137-161
A modification of the Wherry-Doolittle test selection method is presented by which tests are included in a multiple correlation (obtained for a given battery of tests) in the sequence in which the rate of return in validity per unit of testing time is greatest, rather than in the order of the size of their contribution to the multiple correlation. It is proposed that the modified method can be utilized profitably when there are economic or practical limits on the time available for test administration.The major portion of this article is based upon a thesis by W. F. Long directed by Dr. Joseph Tiffin with the counsel of Dr. Irving W. Burr. This thesis was submitted in partial fulfillment of the requirements for the degree of Master of Science in Psychology, Purdue University, June, 1947.  相似文献   

14.
If statewide test norms are useful in student counseling, state testing programs could provide predictions for individual students as a service to local schools. Since the choice of a high school curriculum is a major problem on which youth seeks guidance, the predictive validity of a statewide test battery, using state norms, is explored with curriculum choice as the criterion, and is compared with the validity of the same tests for the same criterion, but using local norms. The separation in state norms of career-goals groups and of the sexes is also explored. Results achieved are positive enough to encourage widespread and energetic application of the methods by state testing officers, and to imply their responsibility to make classification probabilities available to their customers.  相似文献   

15.
A meta‐analysis on the validity of tests of general mental ability (GMA) and specific cognitive abilities for predicting job performance and training success in the UK was conducted. An extensive literature search resulted in a database of 283 independent samples with job performance as the criterion (N=13,262), and 223 with training success as the criterion (N=75,311). Primary studies were also coded by occupational group, resulting in seven main groups (clerical, engineer, professional, driver, operator, manager, and sales), and by type of specific ability test (verbal, numerical, perceptual, and spatial). Results indicate that GMA and specific ability tests are valid predictors of both job performance and training success, with operational validities in the magnitude of .5–.6. Minor differences between these UK findings and previous US meta‐analyses are reported. As expected, operational validities were moderated by occupational group, with occupational families possessing greater job complexity demonstrating higher operational validities between cognitive tests and job performance and training success. Implications for the practical use of tests of GMA and specific cognitive abilities in the context of UK selection practices are discussed in conclusion.  相似文献   

16.
VALIDITY GENERALIZATION RESULTS FOR LAW ENFORCEMENT OCCUPATIONS   总被引:1,自引:0,他引:1  
The Schmidt-Hunter interactive validity generalization procedure was applied to validity data for cognitive abilities tests for law enforcement occupations. Both assumed artifact distributions, and distributions of artifacts constructed from information contained in the current sample of studies were used to test the hypothesis of situational specificity and to estimate validity generalizability. Results for studies using a criterion of performance in training programs showed that validities ranged from .41 to .71, and for four test types the hypothesis of situational specificity could be rejected using the 75% decision rule. For the remaining test types, validity was generalizable, based on 90% credibility values ranging from .37 to .71. Results for studies using a criterion of performance on the job indicated that the hypothesis of situational specificity was not tenable for three test types, which had validities between .17 and .31. For the remaining test types, estimated mean true validities ranged from .10 to .26 and were generalizable to a majority of situations. Results for both groups of studies were essentially identical for the two types of artifact distribution. Possible reasons for the apparently lower validities and lesser generalizability for job performance criteria are discussed, including possible low validity of the criterion (due to lack of opportunity by supervisors to observe behavior) and the potential role of noncognitive factors in the determination of law enforcement job success. Suggestions for specifically targeted additional research are made.  相似文献   

17.
Maximizing the discriminating power of a multiple-score test involves maximizing the homogeneity of each subtest and minimizing the correlations between subtests. A method is presented for constructing such tests from items whose intercorrelations are not too high. Under certain restrictions the saturation, defined as the proportion of inter-item covariance to total variance, is maximized for each subtest. The nucleus of each subtest is three items with high covariancesinter se. All items which will lower the saturation are discarded; the one item is added which will maximize the saturation of the resultant test. This process is repeated until all the items are included or discarded for that subtest. If the correlation between any such subtests approaches the geometric mean of their saturations, their items form a new pool for one or more subtests. Formulas are presented for deciding which items to eliminate in order to reduce further the correlations between subtests.This research was supported in part by the United States Air Force under Contract AF 33(038)-10588 with Human Resources Research Center, Lackland Air Force Base, San Antonio, Texas. Permission is granted for reproduction, translation, publication use and disposal in whole and in part by or for the United States Government.  相似文献   

18.
A battery of eight different reaction time (RT) tests, measuring the speed with which individuals perform various elementary cognitive processes, and a group test of scholastic aptitude (the Armed Services Vocational Aptitude Battery, ASVAB) were given to 50 black and 56 white male vocational college students. The regression of the general factor scores of the ASVAB on the RT measures yielded a shrunken multiple correlation of 0.465. Although discriminant analyses, when applied separately to the ASVAB subtests and to the RT variables, showed highly comparable overall discrimination (over 70% correct classification) between the black and white groups, factor scores derived from the general factor (labeled ‘speed of information processing’) of the RT battery show only about one-third as large a mean black-white difference as the mean group difference on the general factor scores derived from the ASVAB. Comparisons were also made between the 106 vocational college students and 100 university students of higher average academic aptitude who had previously been tested on the same RT battery (Vernon, 1983a). These groups showed marked differences on the RT variables, the largest differences occuring on the tests that required more complex cognitive processing. The more complex RT tests also correlate most highly with the psychometric measures of ability within each group. The results are consistent with the hypothesis that individual differences and the mean differences between groups in psychometric abilities and scholastic achievement are related to differences in the speed of information processing as measured in elementary cognitive tasks.  相似文献   

19.
A major goal of the Army Selection and Classification Project was to develop an experimental predictor battery that would best supplement the Armed Forces Vocational Aptitude Battery for making selection and classification decisions for entry-level enlisted personnel. That is, what predictor measures would best serve the needs of all the jobs in an entire selection/classification system? This paper describes the characteristics of the new test battery and the procedures that were used to develop it. The major steps in the procedure were a structured literature search using a standard protocol, an extensive expert judgment study of expected true validities for a population of predictor variables against a population of performance components, fabrication of modularized software and a special response pedestal for computerized measurement of perceptual and psychomotor abilities, evaluations of experimental measures in three iterative pilot tests and one major field test, and a series of reviews by a panel of scientific advisers. The test battery that resulted from this 2 1/2-year development effort is described. The basic psychometric properties of each measure, as determined in a large concurrent validation sample, are also described.  相似文献   

20.
The Great Eight competencies: a criterion-centric approach to validation   总被引:1,自引:0,他引:1  
The author presents results of a meta-analysis of 29 validation studies (N=4,861) that uses the Great Eight competency factors (Kurz & Bartram, 2002) as the criterion measurement framework. Predictors of the Great Eight competencies based only on personality scales show moderate to good correlations with line-manager ratings for all 8 of the competencies. On their own, ability tests correlate with 4 of the 8 competencies, and together ability and personality data yield operational validities ranging from 0.20 to 0.44 for the 8 competencies. Operational validities for aggregated predictors with aggregated criteria were estimated to be 0.53. The value of differentiating the criterion space and of relating predictor variables to criterion variables in a one-to-one fashion is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号