首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Empirical Bayes meta-analysis provides a useful framework for examining test validation. The fixed-effects case in which rho has a single value corresponds to the inference that the situational specificity hypothesis can be rejected in a validity generalization study. A Bayesian analysis of such a case provides a simple and powerful test of rho = 0; such a test has practical implications for significance testing in test validation. The random-effects case in which sigma2rho > 0 provides an explicit method with which to assess the relative importance of local validity studies and previous meta-analyses. Simulated data are used to illustrate both cases. Results of published meta-analyses are used to show that local validation becomes increasingly important as sigma2rho increases. The meaning of the term validity generalization is explored, and the problem of what can be inferred about test transportability in the random-effects case is described.  相似文献   

3.
Few published reports exist on the appropriateness of curriculum-based measurement (CBM) procedures for literature-based reading programs. The purpose of this paper is to describe a preliminary study of the development and psychometric integrity of CBM norms for literature-based reading using children's literature books. During the 1993–1994 and 1994–1995 school years, the reading achievement of 403 first, second, and third grade students in two rural elementary schools was assessed with CBM methods modified for a literature-based curriculum. Based on the results of this study, standard CBM procedures can be modified to develop local norms for the literature-based classroom. Furthermore, these literature-based CBM norms were found to have adequate test/retest reliability and criterion-related validity. No sampling bias in the selection of individual reading passage probes was found. The extent to which these norms can be used to monitor reading progress must be determined. Further study is necessary before CBM procedures with literature-based reading programs can be used to inform policy and practice.  相似文献   

4.
The potential for applicant response distortion on personality measures remains a major concern in high‐stakes testing situations. Many approaches to understanding response distortion are too transparent (e.g., instructed faking studies) – or are too subtle (e.g., correlations with social desirability measures as indices of faking). Recent research reveals more promising approaches in two methods: using forced‐choice (FC) personality test items and warning against faking. The present study examined effects of these two methods on criterion‐related validity and test‐taker reactions. Results supported incremental validity for an FC and Likert‐scale measure in warning and no‐warning conditions, above and beyond cognitive ability. No clear differences emerged between the FC vs Likert measures or warning vs no‐warning conditions in terms of validity. However, some evidence suggested that FC measures and warnings may produce negative test‐taker reactions. We conclude with implications for implementation in selection settings.  相似文献   

5.
This note shows that evidence considered by Wooten (1984) to support claims of diagnostic superiority of standard over local MMPI norms, may mainly reflect the fact that a large proportion of the group tested by Wooten's consisted of persons with emotional, behavioral problems. Reevaluation of Wooten's data suggests a range of conditions under which Wooten's diagnostic criterion used with local norms is superior to the same criterion used with standard norms, and another range of conditions under which this criterion used with either local or standard norms is inferior to base rates.  相似文献   

6.
Hunter Mabon 《人类行为》2013,26(2-3):289-304
The purpose of this study is threefold: (a) to examine the extent to which two personality measures appear to function in an Industrial Organizational context, but in a different language and culture; (b) to study their construct and concurrent validity; and (c) to relate these findings to utility analyses. These three tasks were done to determine the extent to which personality measurements can provide a positive financial outcome to organizations in a selection situation. Swedish versions of 2 well-known U.S. tests, Service First, a customer service measure, and the Hogan Personality Inventory (HPI; Hogan & Hogan, 1992), a Big Five personality measure, were administered to several hundred employees, job applicants, and students in a range of organizations. Despite considerable differences in culture (especially attitudes to service, education, and life goals), the Swedish norms and factor structures for the 2 tests were remarkably similar to those of the United States, confirming that tests of this type can be used in different environments. When compared with each other and with the Myers-Briggs Type Inventory (Consulting Psychologists Press, 1991, 1995), the results also confirmed that their construct validity had survived the transformation to anew culture and language. Two concurrent criterion validity studies showed that the correlation between test results and different kinds of criterion data were highly significant, suggesting that the tests can be used to forecast work performance. Additional information was obtained from the 2 companies concerning salaries and performance variation and these were used to perform a utility analysis to show the substantial financial benefit of using personality testing for selection as well as in a downsizing context.  相似文献   

7.
Computerized adaptive testing in personality assessment can improve efficiency by significantly reducing the number of items administered to answer an assessment question. Two approaches have been explored for adaptive testing in computerized personality assessment: item response theory and the countdown method. In this article, the authors review the literature on each and report the results of an investigation designed to explore the utility, in terms of item and time savings, and validity, in terms of correlations with external criterion measures, of an expanded countdown method-based research version of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), the MMPI-2 Computerized Adaptive Version (MMPI-2-CA). Participants were 433 undergraduate college students (170 men and 263 women). Results indicated considerable item savings and corresponding time savings for the adaptive testing modalities compared with a conventional computerized MMPI-2 administration. Furthermore, computerized adaptive administration yielded comparable results to computerized conventional administration of the MMPI-2 in terms of both test scores and their validity. Future directions for computerized adaptive personality testing are discussed.  相似文献   

8.
Assessment of psychomotor abilities for prediction of human performance is briefly reviewed. Reasons for the abandonment of psychomotor testing for selection applications are described. We review innovations in touch‐sensitive computer monitors as a methodology for relatively low‐cost, highly flexible test development, validation, and application of standard psychomotor tests. The development and evaluation of 5 psychomotor test types are described including discrete response tests (choice‐simple reaction time [RT], serial RT, and tapping) and continuous‐response tests (maze tracing and mirror tracing). Two empirical studies of the new psychomotor tests are presented, with a broad array of perceptual speed and cognitive abilities providing evidence for construct validity. In addition, some of the psychomotor tests are validated against a real‐time simulation criterion (the Kanfer‐Ackerman Air Traffic Controller Task©). We argue that these new innovations provide a means toward revisiting psychomotor testing to augment employee selection batteries.  相似文献   

9.
The relative validities of forced‐choice (ipsative) and Likert rating‐scale item formats as criterion measures are examined. While there has been much debate about the relative technical and psychometric merits and demerits of ipsative instruments, the present research focused on the crucial question of whether the use of this format has any practical benefit – in terms of improved validity. An analysis is reported from a meta‐analysis data set. This demonstrates that higher operational validity coefficients (prediction of line‐manager ratings of competencies) are associated with the use of forced‐choice (r=.38) rather than rating scale (r=.25) item formats for the criterion measurement instrument when performance is rated by the same line managers on both formats and where the predictor is held constant. Thus the apparent criterion‐related validity of a predictor can increase by 50% simply by changing the format of the criterion measurement instrument. The implications of this for practice are discussed.  相似文献   

10.
Multiple‐choice (MC) tests are arguably the most widely used testing format in applied settings. In the psychometric and education literatures, research on the optimal number of options for knowledge and ability MC tests has revealed that three‐option tests are psychometrically equivalent and, in some cases, superior to five‐option tests. In addition, there are a number of practical, economic, and administrative advantages associated with the use of three‐option MC tests. Yet, despite its advantages, the three‐option format is underutilized in personnel selection. Across two studies, we compared test‐taker perceptions, criterion‐related validity, and sex‐based subgroup differences, and in Study 1, we compared race‐based subgroup differences on three‐ and five‐option tests. Participants in the two studies completed a three‐ or five‐option version of ACT. Test perceptions, criterion‐related validity, and race‐ and sex‐based subgroup differences were similar across test formats. The implications for the expanded use of three‐option tests in applied settings and future directions for research are discussed.  相似文献   

11.
This study examined the magnitude of differences in standard scores, convergent validity, and concurrent validity when an individual's performance was gauged using the revised and the normative update (Woodcock, 1998) editions of the Woodcock Reading Mastery Test in which the actual test items remained identical but norms have been updated. From three metropolitan areas, 899 first to third grade students referred by their teachers for a reading intervention program participated. Results showed the inverse Flynn effect, indicating systematic inflation averaging 5 to 9 standard score points, regardless of gender, IQ, city site, or ethnicity, when calculated using the updated norms. Inflation was greater at lower raw score levels. Implications for using the updated norms for identifying children with reading disabilities and changing norms during an ongoing study are discussed.  相似文献   

12.
Despite ad hoc claims that parents often are in opposition to a schooling curriculum that is inclusive of gender and sexuality diversity, there exists no research to date that has canvassed the reasons why parents may oppose or support such educational policy via a psychometrically sound instrument. The aim of the present study was to address this gap by developing and testing a new, multidimensional measure of the theorized nature of parental attitudes towards inclusiveness, the Parental Attitudes Towards Inclusiveness Instrument (PATII). The pilot sample of 998 parents who had a child attending school in any grade from Kindergarten to Year 12 were drawn from the United Kingdom (UK) and the United States (U.S.) via the online recruitment platform, Prolific. The PATII was evaluated for its reliability using McDonald's omega, construct and criterion validity, and measurement invariance utilizing exploratory structural equation modelling (ESEM), with initial ESEM analyses also compared to traditional confirmatory factor analysis (CFA) methods. Scores derived from this measure and inferences based upon those scores were reliable, valid, and also invariant across sex, religiosity, and nationality groups within this sample. Parental sex, religiosity, and nationality group membership were differentially correlated with support for and opposition to an inclusive curriculum. Lastly, the criterion validity of the PATII was supported, with the instrument's factors differentially correlated to parents' desired providers of inclusive education as predicted. Future national and international use of the PATII offers a critical first step to informing school and curriculum policy on inclusivity.  相似文献   

13.
Although the criterion-related validity of integrity tests is well established, there has not been enough research examining which personality constructs contribute to their criterion-related validity. Moreover, evidence of how well findings on integrity tests in North America generalize to non-English speaking countries is virtually absent. This research addressed these issues with data obtained from employees and students in Canada and Germany (total N = 853). Specifically, we tested the hypotheses that (a) Honesty–Humility, as specified in the HEXACO model of personality, is relatively more important than the Big 5 dimensions of personality in accounting for the criterion-related validity of overt integrity tests, whereas (b) the Big 5 are relatively more important in explaining the validity of personality-based integrity tests. These predictions were tested using 2 criteria (counterproductive work behavior and counterproductive academic behavior) as well as 2 overt and 2 personality-based integrity tests. We found evidence of the expected differences between types of integrity tests largely regardless of culture of the sample, specific test, criterion, or population under research, pointing to some degree of generalizability of findings in integrity testing research. Implications include theoretical refinements in research on integrity testing and encouragement of practical applications beyond North America.  相似文献   

14.
We explored the effects of drug use history (current/recent user of drugs, used/tried drugs, never tried drugs) and a measure of drug test consequences (termination versus rehabilitation) on the perceived fairness of organizational drug testing (DT). Data were collected as part of a statewide telephone survey of the general adult population. Personal drug use history and DT consequences interacted such that DT consequences were related to DT fairness only for nonusers who had past drug use experience. The importance of past drug use in understanding reactions to DT are discussed.  相似文献   

15.
16.
Significant job-relatedness was found for a posttraining job knowledge test criterion using an application of Lawshe's content validity method. The aide test was used as a criterion to assess the predictive validity of a vocabulary test and a civil service test with samples of black ( N = 43) and white ( N = 62) psychiatric aides. Significant validities were found on both tests, but a vocabulary test proved to be the better predictor of the criterion in both samples. The obtained validities were discussed in terms of differential validity, test fairness, and sample size. This study demonstrated that a content validity method could be applied to criteria as well as selection tests. It was concluded that content validity methods may be able to help solve the problem of criterion relevance in validation research by providing quantitative evidence of the job-relatedness of criteria.  相似文献   

17.
This paper discusses the roles of validity, cut score choice, and adverse impact on selection system utility using data from two concurrent validation studies. We contrast an assessment center and published aptitude test on several metrics, including validity, testing costs, adverse impact, and utility. The assessment center produced slightly lower validity than the aptitude test while costing roughly 10 times as much per candidate. In spite of these advantages for the aptitude test, the assessment center produced so much less adverse impact its operational utility would be higher given cut scores likely to be chosen in this organization. Potential concerns with applying net utility models to this type of situation are discussed in comparison to gross utility models.  相似文献   

18.
This report describes two studies comparing the criterion-related validity of sex-balanced (“unisex”) interest inventory scales, i.e., scales designed such that the distributions of scores are similar for males and females, and traditional, sex-restrictive scales. Approximately 1600 college-bound high school seniors (Study 1) and 2000 college seniors (Study 2) completed both the ACT Interest Inventory (ACT-IV) and the new Unisex Edition of the ACT-IV (UNIACT), which contains sex-balanced items. In both studies, each participant was placed in one of six criterion groups based on the correspondence of expressed occupational choice (Study 1) and actual college major (Study 2) to Holland types. Comparable levels of criterion-related validity were obtained with the unisex scales, sex-restrictive scales, and with sex-balanced scores obtained by the traditional procedure of using same-sex norms. Study results and the results of previous research indicate that (a) psychometrically sound interest inventories can be constructed with sex-balanced items, and (b) counselors may use inventories which provide sex-balanced score reports without sacrificing validity.  相似文献   

19.

Purpose

Berry et al.’s (J Appl Psychol 96:881–906, 2011) meta-analysis of cognitive ability test validity data across employment, college admissions, and military domains demonstrated that validity is lower for Black and Hispanic subgroups than for Asian and White subgroups. However, Berry et al. relied on observed test-criterion correlations and it is therefore not clear whether validity differences generalize beyond observed validities. The present study investigates the roles that range restriction and criterion contamination play in differential validity.

Design/Methodology/Approach

A large dataset (N > 140,000) containing SAT scores and college grades of Asian, Black, Hispanic, and White test takers was used. Within-race corrections for multivariate range restriction were applied. Differential validity analyses were carried out using freshman GPA versus individual course grades as criteria to control for the contaminating influence of individual differences between students in course choice.

Findings

Observed validities underestimated the magnitude of validity differences between subgroups relative to when range restriction and criterion contamination were controlled. Analyses also demonstrate that validity differences would translate to larger regression slope differences (i.e., differential prediction).

Implications

Subgroup differences in range restriction and/or individual differences in course choice cannot account for lower validity of the SAT for Black and Hispanic subgroups. Controlling for these factors increased subgroup validity differences. Future research must look to other explanations for subgroup validity differences.

Originality

The present study is the first differential validity study to simultaneously control for range restriction and individual differences in course choice, and answers a call to investigate potential causes of differential validity.  相似文献   

20.
Bornstein RF 《心理评价》2011,23(2):532-544
Although definitions of validity have evolved considerably since L. J. Cronbach and P. E. Meehl's classic (1955) review, contemporary validity research continues to emphasize correlational analyses assessing predictor-criterion relationships, with most outcome criteria being self-reports. The present article describes an alternative way of operationalizing validity--the process-focused (PF) model. The PF model conceptualizes validity as the degree to which respondents can be shown to engage in a predictable set of psychological processes during testing, with those processes dictated a priori by the nature of the instrument(s) used and the context in which testing takes place. In contrast to the traditional approach wherein correlational methods are used to quantify the relationship between test score and criterion, the PF model uses experimental methods to manipulate variables that moderate test score-criterion relationships, enabling researchers to draw more definitive conclusions regarding the impact of underlying psychological processes on test scores. By complementing outcome-based validity assessment with a process-driven approach, researchers will not only improve psychology's assessment procedures but also enhance their understanding of test bias and test score misuse by illuminating the intra- and interpersonal factors that lead to differential performance (and differential prediction) in different groups.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号