期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Three‐option Format for Knowledge and Ability Multiple‐choice Tests: A case for why it should be more commonly used in personnel testing

Bryan D. Edwards Winfred Arthur Jr Leonardis L. Bruce 《International Journal of Selection & Assessment》2012,20(1):65-81

Multiple‐choice (MC) tests are arguably the most widely used testing format in applied settings. In the psychometric and education literatures, research on the optimal number of options for knowledge and ability MC tests has revealed that three‐option tests are psychometrically equivalent and, in some cases, superior to five‐option tests. In addition, there are a number of practical, economic, and administrative advantages associated with the use of three‐option MC tests. Yet, despite its advantages, the three‐option format is underutilized in personnel selection. Across two studies, we compared test‐taker perceptions, criterion‐related validity, and sex‐based subgroup differences, and in Study 1, we compared race‐based subgroup differences on three‐ and five‐option tests. Participants in the two studies completed a three‐ or five‐option version of ACT. Test perceptions, criterion‐related validity, and race‐ and sex‐based subgroup differences were similar across test formats. The implications for the expanded use of three‐option tests in applied settings and future directions for research are discussed. 相似文献

2.

Technique for weighting of choices and items on I.B.M. scoring machines

Grossman Sergeant David 《Psychometrika》1944,9(2):101-105

A technique has been developed which permits the weighting of responses of test items on the I. B. M. scoring machine on the initial scoring, heretofore impossible. This is done by making the length of the response lines on the answer sheet longer or shorter as weights are needed. It is anticipated that this method will prove useful wherever differential weighting serves to increase the validity of tests. 相似文献

3.

Selecting Predictor Subsets: Considering validity and adverse impact

Wilfried De Corte Paul Sackett Filip Lievens 《International Journal of Selection & Assessment》2010,18(3):260-270

The paper proposes a procedure for designing Pareto‐optimal selection systems considering validity, adverse impact and constraints on the number of predictors from a larger subset that can be included in an operational selection system. The procedure determines Pareto‐optimal composites of a given maximum size thereby solving the dual task of identifying the predictors that will be included in the reduced set and determining the weights with which the retained predictors will be combined to the composite predictor. Compared with earlier proposals, the simultaneous consideration of both tasks makes it possible to combine several strategies for reducing adverse impact in a single procedure. In particular, the present approach allows integrating (a) investigating a large number of possible predictors (such as multitest battery of ability tests, or a collection of ability and nonability measures); (b) explicit predictor weighting within feasible test procedures of a given limited size. 相似文献

4.

Subject Matter Expert Judgments Regarding the Relative Importance of Competencies are not Useful for Choosing the Test Batteries that Best Predict Performance

Kevin R. Murphy Paige J. Deckert Theodore B. Kinney Mei‐Chuan Kung 《International Journal of Selection & Assessment》2013,21(4):419-429

Several recent articles have suggested that assessments of the relative importance of different abilities or competencies to a job have little bearing on the criterion‐related validity of these selection tests that measure those abilities. We hypothesize that selection test batteries chosen to maximize the judged importance of knowledge, skills, and abilities will not predict performance better than batteries of tests chosen at random. The results in two independent samples consistently show that the validity of test batteries chosen based on subject matter expert judgments of importance is not different from the validity of batteries of a comparable number of tests chosen at random from a set of intercorrelated tests, or even those chosen to provide the worst possible match between test content and job content. 相似文献

5.

DEVELOPMENT OF A JOB ANALYSIS-BASED PROCEDURE FOR WEIGHTING AND COMBINING CONTENT RELATED TESTS INTO A SINGLE TEST BATTERY SCORE

WINFRED ARTHUR JR. DENNIS DOVERSPIKE GERALD V. BARRETT 《Personnel Psychology》1996,49(4):971-985

By definition, content-related approaches to test validation do not rely on criterion data. As a consequence, regression and other statistical procedures for weighting and generating a composite score from a test battery are not applicable when a content-related validation strategy is used. This paper presents a procedure for determining the component weights for a test battery that has been developed on the basis of a content-related validity strategy. The Relative Content Contribution (RCC) weighting procedure is a logical extension of the conceptual basis underlying the rational developmental process used to demonstrate the validity of content-related tests. Results from field implementations of the procedure in the development of two promotional test batteries (fire and police) and an entry-level test battery (police) in two large metropolitan cities are presented to illustrate the procedure. 相似文献

6.

Some Comments on Pareto Thinking,Test Validity,and Adverse Impact: When ‘and’ is optimal and ‘or’ is a trade‐off

Denise Potosky Philip Bobko Philip L. Roth 《International Journal of Selection & Assessment》2008,16(3):201-205

De Corte, Lievens, and Sackett add to the literature on selection test validity and adverse impact (AI). Their Pareto‐based weighting scheme essentially asks organizations if they are willing to give up some validity to hopefully achieve some reduction in AI. We considered their approach and conclusions in relation to the regression weighting method we used, and we offer five points that reflect our observations as well as our shared goals. We hope our comments, like their work in this field, will invigorate the pursuit of new ways of examining, and one day resolving, the persistent concern regarding the AI associated with valid selection tests. 相似文献

7.

Expert and Target Scoring: Their relation,corresponding test instructions,and their effects on the construct validity of the video‐based social understanding test (VSU)

下载免费PDF全文

Kristin Conzelmann Panja Goerke 《International Journal of Selection & Assessment》2015,23(1):1-13

This study investigated the relation between expert and target scoring of a video‐based social understanding test (VSU) under two different types of instructions (internal and observer). The effects of the scoring methods and instructions on the VSU's construct validity were also examined. A total of 529 pilot applicants completed the VSU (some with internal and some with observer instructions), cognitive ability and knowledge tests, and a personality questionnaire. A subsample (n = 132) completed the VSU again with the other instructions and participated in an assessment center (AC). The two scores were moderately correlated; correlations decreased when the instructions were considered. Neither expert nor target scores showed convergent validity with AC variables; none of the scoring‐instruction combinations showed significant associations with the remaining measures. 相似文献

8.

履历数据测评的效度分析 总被引：2，自引：0，他引：2

严进吴英杰张娓《心理学报》2010,42(3):423-433

履历数据是人员测评的重要手段, 但其组织情景性限制使得国外同类工具不能适用于国内人事选拔, 国内也缺乏实证研究验证其信度、效度。本研究结合某通信企业的招募选拔工作, 开发履历数据分析工具, 选取250名应聘者的履历数据、一般认知能力、大五个性和面试结果数据, 通过效标关联效度、增量效度思想, 分析履历数据有效性。结果表明, 以面试结果为效标, 履历数据具有良好的效标关联效度, 与其他测评工具组合使用时有良好的增量效度。相似文献

9.

POTENTIAL EFFECTS OF BANDING AS A FUNCTION OF TEST RELIABILITY

KEVIN R. MURPHY 《Personnel Psychology》1994,47(3):477-495

Cascio, Outtz, Zedeck, and Goldstein (1991) described the application of a number of test score banding procedures in personnel selection. Equations are developed illustrating the relationship between the width of test score bands and test reliability. When reliability is moderate to low, bands are likely to be larger than the standard deviation of the test, and are likely to include a large proportion of the applicant pool. The relationships between band widths and the differences between higher scoring and lower scoring groups are also examined. When the band is smaller than the differences between groups (which may happen when highly reliable tests are used), banding may not by itself prove effective as a means of reducing the adverse impact of tests, even when banding systems that maximize opportunities for members of the lower scoring group are used. 相似文献

10.

Choosing the best method for local validity estimation: relative accuracy of meta-analysis versus a local study versus Bayes-analysis

Newman DA Jacobs RR Bartram D 《The Journal of applied psychology》2007,92(5):1394-1413

This study assessed the relative accuracy of 3 techniques--local validity studies, meta-analysis, and Bayesian analysis--for estimating test validity, incremental validity, and adverse impact in the local selection context. Bayes-analysis involves combining a local study with nonlocal (meta-analytic) validity data. Using tests of cognitive ability and personality (conscientiousness) as predictors, an empirically driven selection scenario illustrates conditions in which each of the 3 estimation techniques performs best. General recommendations are offered for how to estimate local parameters, based on true population variability and the number of studies in the meta-analytic prior. Benefits of empirical Bayesian analysis for personnel selection are demonstrated, and equations are derived to help guide the choice of a local validity technique (i.e., meta-analysis vs. local study vs. Bayes-analysis). 相似文献

11.

Psychomotor abilities via touch‐panel testing: Measurement innovations,construct, and criterion validity

Phillip L. Ackerman Anna T. Cianciolo 《人类行为》2013,26(3-4):231-273

Assessment of psychomotor abilities for prediction of human performance is briefly reviewed. Reasons for the abandonment of psychomotor testing for selection applications are described. We review innovations in touch‐sensitive computer monitors as a methodology for relatively low‐cost, highly flexible test development, validation, and application of standard psychomotor tests. The development and evaluation of 5 psychomotor test types are described including discrete response tests (choice‐simple reaction time [RT], serial RT, and tapping) and continuous‐response tests (maze tracing and mirror tracing). Two empirical studies of the new psychomotor tests are presented, with a broad array of perceptual speed and cognitive abilities providing evidence for construct validity. In addition, some of the psychomotor tests are validated against a real‐time simulation criterion (the Kanfer‐Ackerman Air Traffic Controller Task©). We argue that these new innovations provide a means toward revisiting psychomotor testing to augment employee selection batteries. 相似文献

12.

Understanding how and why adding valid predictors can decrease the validity of selection composites: A generalization of Sackett,Dahlke, Shewach,and Kuncel (2017)

Kevin R. Murphy 《International Journal of Selection & Assessment》2019,27(3):249-255

It is usually assumed that adding more valid predictors will increase the predictive power of a selection test battery. Sackett, Dahlke, Shewach, and Kuncel showed that when selection tests are combined using unit weights, adding a valid predictor can lead to a decrease in validity. Situating the Sackett et al. approach in a more general multivariate framework I show how: (a) it is the tradeoff between predictor validity and predictor intercorrelations, and not the differences in predictor validities that determines whether adding a valid predictor to a composite will cause the validity of that composite to increase or decrease; and (b) this same dynamic applies across a wide range of non‐optimal schemes for weighting predictors and/or criteria. 相似文献

13.

Regression weights as a function of test length

HORST P 《Psychometrika》1948,13(3):125-134

A battery of pencil-and-paper tests is commonly used for predicting a single criterion. If the score on each test is the number of correct answers, the composite battery score would normally be the sum of the weighted test scores, where the weights are the raw score regression weights. Knowing the reliability of each test, it is possible to alter the lengths of the tests in a manner such that the weights will all be equal. The composite battery score would then simply be the total number of items answered correctly and scoring would be greatly simplified. Such simplification is particularly desirable where the volume of testing is large. Section I of the article outlines the procedure for altering the lengths of the tests, and Section II gives a proof of the method. 相似文献

14.

The Cross‐cultural Transportability of Situational Judgment Tests: How does a US‐based integrity situational judgment test fare in Spain?

下载免费PDF全文

Filip Lievens Jan Corstjens Miguel Ángel Sorrel Francisco José Abad Julio Olea Vicente Ponsoda 《International Journal of Selection & Assessment》2015,23(4):361-372

Despite the globalization of HRM, there is a dearth of research on the potential use of contextualized selection instruments such as situational judgment tests (SJTs) in other countries than those where the selection instruments were originally developed. Therefore, two studies are conducted to examine the transportability of an integrity SJT that was originally developed in the United States to a Spanish context. Study 1 showed that most SJT scenarios (16 out of 19) that were developed in the United States were also considered realistic in a Spanish context. In Study 2, the item option endorsement patterns converged to the original scoring scheme, with the exception of two items. In addition, there were high correlations between the original US empirical scoring scheme and two empirical scoring schemes that were tailored to the Spanish context (i.e., mode consensus scoring and proportional consensus scoring). Finally, correlations between the SJT integrity scores and ratings on a self‐report integrity measure did not differ significantly from each other according to the type of scoring key (original US scoring vs. Spanish scoring keys). Overall, these results shed light on potential issues and solutions related to the cross‐cultural use of contextualized selection instruments such as SJTs. 相似文献

15.

The problem of classification of personnel 总被引：2，自引：0，他引：2

THORNDIKE RL 《Psychometrika》1950,15(3):215-235

The personnel classification problem arises in its pure form when all job applicants must be used, being divided among a number of job categories. The use of tests for classification involves problems of two types: (1) problems concerning the design, choice, and weighting of tests into a battery, and (2) problems of establishing the optimum administrative procedure of using test results for assignment. A consideration of the first problem emphasizes the desirability of using simple, factorially pure tests which may be expected to have a wide range of validities for different job categories. In the use of test results for assignment, an initial problem is that of expressing predictions of success in different jobs in comparable score units. These units should take account of predictor validity and of job importance. Procedures are described for handling assignment either in terms of daily quotas or in terms of a stable predicted yield.Address of the President of the Division on Evaluation and Measurement of the American Psychological Association, delivered at Denver, Colorado, September 9, 1949. 相似文献

16.

Effects of Situational Judgment Test Format on Reliability and Validity

Michelle P. Martin-Raugh Cristina Anguiano-Carrsaco Teresa Jackson Meghan W. Brenneman Lauren Carney Patrick Barnwell 《International Journal of Testing》2018,18(2):135-154

Single-response situational judgment tests (SRSJTs) differ from multiple-response SJTs (MRSJTS) in that they present test takers with edited critical incidents and simply ask test takers to read over the action described and evaluate it according to its effectiveness. Research comparing the reliability and validity of SRSJTs and MRSJTs is thus far extremely limited. The study reported here directly compares forms of a SRSJT and MRSJT and explores the reliability, convergent validity, and predictive validity of each format. Results from this investigation present preliminary evidence to suggest SRSJTs may produce internal consistency reliability, convergent validity, and predictive validity estimates that are comparable to those achieved with many traditional MRSJTs. We conclude by discussing practical implications for personnel selection and assessment, and future research in psychological science more broadly. 相似文献

17.

THE PREDICTIVE VALIDITY OF A WORK SAMPLE: A LABORATORY STUDY

MICHAEL K. MOUNT PAUL M. MUCHINSKY LAWRENCE M. HANSER 《Personnel Psychology》1977,30(4):637-645

Concurrent and predictive test validity data and test-retest reliability data were obtained for a work sample performance measure and two paper and pencil tests in a laboratory setting. The work sample predicted performance on the criterion comparably with the two traditional paper and pencil tests for both concurrent and predictive validity conditions. The results of this study coupled with the inherent advantages of work samples for personnel selection offer a favorable prognosis for future research and application of work samples. The findings are interpreted in light of a behavioral consistency model and the practical utility of work samples as a personnel selection technique. 相似文献

18.

Using Invariance to Examine Cheating in Unproctored Ability Tests

Natalie A. Wright Adam W. Meade Sara L. Gutierrez 《International Journal of Selection & Assessment》2014,22(1):12-22

Despite their widespread use in personnel selection, there is concern that cheating could undermine the validity of unproctored Internet‐based tests. This study examined the presence of cheating in a speeded ability test used for personnel selection. The same test was administered to applicants in either proctored or unproctored conditions. Item response theory differential functioning analyses were used to evaluate the equivalence of the psychometric properties of test items across proctored and unproctored conditions. A few items displayed different psychometric properties, and the nature of these differences was not uniform. Theta scores were not reflective of widespread cheating among unproctored examinees. Thus, results were not consistent with what would be expected if cheating on unproctored tests was pervasive. 相似文献

19.

AN EVALUATION OF ALTERNATE SCORING METHODS FOR THE MIXED STANDARD SCALE

GARRY L. HUGHES ERICH P. PRIEN 《Personnel Psychology》1986,39(4):839-847

This study investigated the psychometric properties of three methods of scoring a Mixed Standard Scale (MSS) performance evaluation: the patterned procedure as corrected by Saal (1979); a simple nonpatterned scoring procedure suggested by Prien, Jones, and Miller (1977), which gives equal weights to the performance statements; and a procedure that assigned differential weights to each statement on the basis of scale values provided by a panel of subject matter experts. Interrater reliabilities, scale variances for averaged ratings, and a convergent/discriminant validity analysis, which included an alternate method of job skill ratings, indicated no difference in the score distribution variance, interrater reliability, or validity of different method scores. 相似文献

20.

Applicant Perceptions of Selection Procedures: The Role of Selection Information,Belief in Tests,and Comparative Anxiety

Filip Lievens Wilfried De Corte Katrien Brysse 《International Journal of Selection & Assessment》2003,11(1):67-77

This study addresses the effects of the provision of information on the reliability and validity of selection procedures and the effects of test–taker attitudes (i.e., belief in tests and comparative anxiety) on fairness perceptions. Prior to an actual selection process, applicants (N= 118) were given either information about the reliability and validity of various selection procedures or no information. Next, they evaluated the fairness of eight selection procedures. No significant effect of selection information was found. Belief in tests had significant effects, with applicants high on test belief giving higher fairness ratings than applicants low on test belief. In addition, an interaction effect between test belief and selection procedure was found. For example, test belief had larger effects on fairness for structured interviews, personality inventories, and cognitive ability tests. No significant effect of comparative anxiety on fairness was found. 相似文献