首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study demonstrated the use of quantitative content validity procedures in the development of a job-related behavioral rating scale criterion for entry-level psychiatric aides. Work behavior items were developed by staff from 6 state psychiatric hospitals, placed in a content validity questionnaire using the Lawshe format, and given to a representative sample of 38 aides and supervisors. Seventy-eight of 83 items were found to be significantly job-relevant using the computation procedures of both Lawshe and Aiken. After the significant items were grouped into 4 categories with high interjudge agreement and placed in a rating scale format, ratings were obtained on 72 psychiatric aides from 4 hospitals. Items in the 4 categories were found to be internally consistent using coefficient alpha. Significant but low concurrent validities were established for 2 verbal ability selection tests using the rating criterion. The validities found were interpreted to be especially significant when the factors of low selection ratio, restriction in range, and limited rater training were considered.  相似文献   

2.
VALIDITY GENERALIZATION RESULTS FOR LAW ENFORCEMENT OCCUPATIONS   总被引:1,自引:0,他引:1  
The Schmidt-Hunter interactive validity generalization procedure was applied to validity data for cognitive abilities tests for law enforcement occupations. Both assumed artifact distributions, and distributions of artifacts constructed from information contained in the current sample of studies were used to test the hypothesis of situational specificity and to estimate validity generalizability. Results for studies using a criterion of performance in training programs showed that validities ranged from .41 to .71, and for four test types the hypothesis of situational specificity could be rejected using the 75% decision rule. For the remaining test types, validity was generalizable, based on 90% credibility values ranging from .37 to .71. Results for studies using a criterion of performance on the job indicated that the hypothesis of situational specificity was not tenable for three test types, which had validities between .17 and .31. For the remaining test types, estimated mean true validities ranged from .10 to .26 and were generalizable to a majority of situations. Results for both groups of studies were essentially identical for the two types of artifact distribution. Possible reasons for the apparently lower validities and lesser generalizability for job performance criteria are discussed, including possible low validity of the criterion (due to lack of opportunity by supervisors to observe behavior) and the potential role of noncognitive factors in the determination of law enforcement job success. Suggestions for specifically targeted additional research are made.  相似文献   

3.
A COMPARISON OF CRITERIA FOR TEST VALIDATION: A META-ANALYTIC INVESTIGATION   总被引:1,自引:0,他引:1  
Meta-analyses of validity coefficients from tests of clerical abilities for five criteria—supervisor ratings, supervisor rankings, work samples, production quantity, and production quality—were conducted, and the resulting expected true validities were compared. Ratings, rankings, work samples, and production quantity all resulted in high test validities. Validities resulting from ratings and quantity-of-production criteria were highly similar across tests. Validities resulting from rankings and work samples were on the average higher than those from ratings and quantity of production. The fifth criterion, quality of production, had low predictability and did not generalize across situations.  相似文献   

4.
Integrity tests have become a prominent predictor within the selection literature over the past few decades. However, some researchers have expressed concerns about the criterion-related validity evidence for such tests because of a perceived lack of methodological rigor within this literature, as well as a heavy reliance on unpublished data from test publishers. In response to these concerns, we meta-analyzed 104 studies (representing 134 independent samples), which were authored by a similar proportion of test publishers and non-publishers, whose conduct was consistent with professional standards for test validation, and whose results were relevant to the validity of integrity-specific scales for predicting individual work behavior. Overall mean observed validity estimates and validity estimates corrected for unreliability in the criterion (respectively) were .12 and .15 for job performance, .13 and .16 for training performance, .26 and .32 for counterproductive work behavior, and .07 and .09 for turnover. Although data on restriction of range were sparse, illustrative corrections for indirect range restriction did increase validities slightly (e.g., from .15 to .18 for job performance). Several variables appeared to moderate relations between integrity tests and the criteria. For example, corrected validities for job performance criteria were larger when based on studies authored by integrity test publishers (.27) than when based on studies from non-publishers (.12). In addition, corrected validities for counterproductive work behavior criteria were larger when based on self-reports (.42) than when based on other-reports (.11) or employee records (.15).  相似文献   

5.
We developed a 36-item scale to measure Openness, using items on the California Psychological Inventory (CPI; Gough, 1957, 1987, 1996), Form 434. Items were initially chosen on the basis of content validity. Five samples (N = 2,375) were used to establish reliability, validity, and norms; 4 samples consisted of university undergraduate students, and 1 comprised applicants for nonmanagement call centerjobs. Internal consistency estimates obtained in each sample averaged approximately .75, and test-retest stability, assessed in 1 sample, was estimated at .84. Cross-correlations with related scales, for example, the NEO Personality Inventory-Revised Openness scale (Costa & McCrae, 1992) and other CPI-based scales, provided evidence of construct validity. Statistically significant predictive validities were obtained in 2 call centerjob-incumbent samples, with range-corrected true validities of .20 to .36 for a number of job performance criteria. Construct and predictive validity were found to be higher than for other scales consisting of CPI items designed to measure Openness or a related construct. Finally, norms were prepared for university undergraduate students (n = 1,847) and nonmanagement service-sector job applicants (n = 528).  相似文献   

6.
A meta‐analysis on the validity of tests of general mental ability (GMA) and specific cognitive abilities for predicting job performance and training success in the UK was conducted. An extensive literature search resulted in a database of 283 independent samples with job performance as the criterion (N=13,262), and 223 with training success as the criterion (N=75,311). Primary studies were also coded by occupational group, resulting in seven main groups (clerical, engineer, professional, driver, operator, manager, and sales), and by type of specific ability test (verbal, numerical, perceptual, and spatial). Results indicate that GMA and specific ability tests are valid predictors of both job performance and training success, with operational validities in the magnitude of .5–.6. Minor differences between these UK findings and previous US meta‐analyses are reported. As expected, operational validities were moderated by occupational group, with occupational families possessing greater job complexity demonstrating higher operational validities between cognitive tests and job performance and training success. Implications for the practical use of tests of GMA and specific cognitive abilities in the context of UK selection practices are discussed in conclusion.  相似文献   

7.
A criterion-related validation was conducted to assess the validity of four aptitude tests and five tests of content taken directly from job tasks in predicting job sample performance of apprentices in eight skilled trades. Observed validities were above .40 (corrected for range restriction, validities averaged .52). Though there were large subgroup mean differences on both predictor and criterion measures, there was no evidence of significant differential prediction.  相似文献   

8.
Review and metaanalyses of published validation studies for the years 1964-1982 of Journal of Applied Psychology and Personnel Psychology were undertaken to examine the effect of (1) research design; (2) criterion used; (3) type of selection instrument used; (4) occupational group studies; and (5) predictor-criterion combination on the level of observed validity coefficients. Results indicate that concurrent validation designs produce validity coefficients roughly equivalent to those obtained in predictive validation designs and that both of these designs produce higher validity coefficients than does a predictive design which includes use of the selection instrument. Of the criteria examined, performance rating criteria generally produced lower validity coefficients than did the use of other more "objective" criteria. In comparing the validities of various types of predictors, it was found cognitive ability tests were not superior to other predictors such as assessment centers, work samples, and supervisory/peer evaluations as has been found in previous metaanalytic work. Personality measures were clearly less valid. Compared to previous validity generalization work, much unexplained variance in validity coefficients remained after corrections for differences in sample size. Finally, the studies reviewed were deficient for our purposes with respect to the data reported. Selection ratios, standard deviations, reliabilities, predictor and criterion intercorrelations were rarely and inconsistently reported. There are also many predictor-criterion relationships for which very few validation efforts have been undertaken.  相似文献   

9.
Differences in test-taker perceptions between overt and personality-based integrity tests were examined. Following administration of both types of integrity tests, 255 undergraduate students provided ratings of perceived face validity and perceived predictive validity. Following receipt of actual test scores, 126 test takers participated in a second phase of the study in which they reported perceptions of distributive justice. Test takers perceived overt integrity tests as having greater face validity and predictive validity than personality-based integrity tests. Perceptions of job-relatedness were not strongly related to test performance on either test type. Distributive justice perceptions were related to test performance, but not type of integrity test.  相似文献   

10.
The Great Eight competencies: a criterion-centric approach to validation   总被引:1,自引:0,他引:1  
The author presents results of a meta-analysis of 29 validation studies (N=4,861) that uses the Great Eight competency factors (Kurz & Bartram, 2002) as the criterion measurement framework. Predictors of the Great Eight competencies based only on personality scales show moderate to good correlations with line-manager ratings for all 8 of the competencies. On their own, ability tests correlate with 4 of the 8 competencies, and together ability and personality data yield operational validities ranging from 0.20 to 0.44 for the 8 competencies. Operational validities for aggregated predictors with aggregated criteria were estimated to be 0.53. The value of differentiating the criterion space and of relating predictor variables to criterion variables in a one-to-one fashion is discussed.  相似文献   

11.
Although most studies of criterion-related validity focus on univariate relationships, the complex and multidimensional nature of the performance construct and the widespread use of multiple selection devices argue in favor of multivariate frameworks for evaluating validity. Using a Monte Carlo simulation we estimated the validity of general cognitive ability tests and personality tests in predicting "job performance," where performance is conceptualized as a composite of multiple performance measures (i.e., individual job task performance and organizational citizenship behaviors). The validity of a selection battery varies substantially as a function of the relative weight given to both predictors and criteria; the 95% confidence interval for validities ranged from .20 to .78. The effective weights given to performance dimensions accounted for 34% of the variance in selection battery validities; depending on precisely how "performance" is defined, the same test battery can have relatively high or relatively low levels of validity. Our model suggests that the way an organization defines job performance is a source of true and important variability in validities, and that the validity of selection tests for predicting complex performance criteria may show considerably less generalizability that current metaanalysis of univariate validities would suggest.  相似文献   

12.
Six nonstressful personality instruments were concurrently validated using respondent and close friend ratings on specific scale dimensions. Tests were administered to 203 predominantly Caucasian college students. Prior to testing, the individual and a close friend through self-ratings and ratings of the respondents had estimated the strengths of the tested personality Variables on a seven-point scale. The definitions of the various personality dimensions were taken from publisher's manuals. An estimated whole test validity was obtained by an average of the individual scale validities using a conversion of Pearson's r to z'. Significant differences among and between subtest validities were found by ANOVA. All "self-rating" validities were significant (.001), with two of these self-rating validities significantly different (.01) than the other four tests. Only two "other-rating" validities were significant. All six tests appeared to be valid for college population use From preliminary analysis, counselees could rate themselves accurately on most test measures. In some cases self-ratings might be used in lieu of giving the test.  相似文献   

13.
The correlation between cognitive ability test scores and performance was separately meta-analyzed for Asian, Black, Hispanic, and White racial/ethnic subgroups. Compared to the average White observed correlation ( = .33, N = 903,779), average correlations were lower for Black samples ( = .24, N = 112,194) and Hispanic samples ( = .30, N = 51,205) and approximately equal for Asian samples ( = .33, N = 80,705). Despite some moderating effects (e.g., type of performance criterion, decade of data collection, job complexity), validity favored White over Black and Hispanic test takers in almost all conditions that included a sizable number of studies. Black-White validity comparisons were possible both across and within the 3 broad domains that use cognitive ability tests for high-stakes selection and placement: civilian employment, educational admissions, and the military. The trend of lower Black validity was repeated in each domain; however, average Black-White validity differences were largest in military studies and smallest in educational and employment studies. Further investigation of the reasons for these validity differences is warranted.  相似文献   

14.
The study tests the distinction between typical and maximum criteria with ratings of transformational leadership performance, and examines whether the criterion-related validities of the five factor model differ for the two types of criteria. Using an East Asian military sample ( n = 1,259) where multiple ratings of typical and maximum performance were obtained from different sources, we used structural equation modeling to test the typical/maximum performance distinction. Results found that typical and maximum performance are different latent constructs and that this distinction is present even after considering rating method factors (i.e., rater source, time). The importance of this distinction is shown by the fact that validities for the personality constructs were not equally predictive of both criteria: Openness was most predictive of maximum performance, Neuroticism was most predictive of typical performance, and Extroversion was predictive of both. By distinguishing typical from maximum performance constructs, relationships between personality and transformational leadership were found to be stronger than previous research suggested.  相似文献   

15.
In this study, we attempted to explore the construct validity of the Kendrick Battery by using an American sample and psychometric tests as indexes of diffuse organicity, depression, and normality. Institutionalized residents (N = 53) were tested twice (6-week interval). When organicity was defined by disorientation and memory deficits, then both the Object Learning test and the Digit Copying test were accurate in differentiating preestablished criterion groups. When organicity was defined more broadly, including sensorimotor function, the Digit Copying test alone was more accurate when depression was defined in terms of irritability, restlessness, and despair. These data suggest that although the Kendrick scales appeared to be sensitive to organicity and depression in this sample, their validity varied with the criteria for each when such were defined psychometrically.  相似文献   

16.
This study describes the development of a multidimensional biodata form which used explicit constructs to guide item generation and rational scale development, construct validation, criterion measurement and empirical keying. These constructs were goal-orientation, teamwork, customer service, resourcefulness, learning ability and leadership. Exploratory and confirmatory factor analyses in both applicant and incumbent samples were used to identify and test the model which included the thirteen, more differentiated rational scales relating to these six, broader constructs. Empirical keying of the rationally developed scales was conducted against criterion construct scales conceptually related to each predictor construct. Empirical keying at the item level was found to result in higher validities and cross-validities than either empirical keying at the scale level, or rational keying. The item keyed instrument also demonstrated incremental validity over a test of cognitive ability for specific work performance domains as well as overall work performance.  相似文献   

17.
This article describes the development and validation of 2 measures of emotional intelligence (EI): the Situational Test of Emotional Understanding (STEU) and the Situational Test of Emotion Management (STEM). Study 1 (N=207 psychology students) examines multiple sources of validity evidence: relationships with EI, vocabulary, personality, and emotion-related criteria. Study 2 (N=149 white-collar volunteers) relates STEU and STEM scores to clinical symptoms, finding relationships to anxiety and stress for both tests, and to depression for the STEM. It is concluded that new performance-based approaches to test development, such as the present ones, might be useful in distinguishing between test and construct effects. Implications for expanding theory and for developing EI interventions are discussed.  相似文献   

18.
This article contributes to the understanding of why the use of a frame-of-reference leads to increased criterion-related validity of personality inventories. Two competing explanations are described and tested. A between-subjects (N = 337) and a within-subject (N = 105) study are conducted to test the hypothesized effects of use of a frame of reference on reliability and validity. Regarding the effects on reliability, use of a frame of reference reduces within-person inconsistency (instead of between-person variability) in responding to generic items. Use of a frame of reference further leads to higher validity as a result of the reduction of between-person variability and within-person inconsistency. Yet, reducing these inconsistencies is not enough. It is also important to use a frame of reference that is conceptually relevant to the criterion. Besides implications for contextualized personality inventories, these results provide an explanation for the moderate validities of generic personality inventories.  相似文献   

19.
In a simulated employee selection exercise, two hundred and ten participants responded to a personality test that varied in terms of item invasiveness and item face validity. A third factor of empirical job-relatedness was manipulated via test instructions. Reactions to the test, the organization, and behavioral intentions about accepting a job offer were measured. Results indicated that item invasiveness and face validity interacted in the prediction of all dependent variables. Specifically, item invasiveness affected the dependent variables only when face validity was low. Implications for the use of personality tests and integrity tests were discussed.  相似文献   

20.
Situational judgment tests (SJTs) are a measurement method that may be designed to assess a variety of constructs. Nevertheless, many studies fail to report the constructs measured by the situational judgment tests in the extant literature. Consequently, a construct-level focus in the situational judgment test literature is lacking, and researchers and practitioners know little about the specific constructs typically measured. Our objective was to extend the efforts of previous researchers (e.g., McDaniel, Hartman, Whetzel, & Grubb, 2007 ; McDaniel & Ngyuen, 2001 ; Schmitt & Chan, 2006 ) by highlighting the need for a construct focus in situational judgment test research. We identified and classified the construct domains assessed by situational judgment tests in the literature into a content-based typology. We then conducted a meta-analysis to determine the criterion-related validity of each construct domain and to test for moderators. We found that situational judgment tests most often assess leadership and interpersonal skills and those situational judgment tests measuring teamwork skills and leadership have relatively high validities for overall job performance. Although based on a small number of studies, we found evidence that (a) matching the predictor constructs with criterion facets improved criterion-related validity; and (b) video-based situational judgment tests tended to have stronger criterion-related validity than pencil-and-paper situational judgment tests, holding constructs constant. Implications for practice and research are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号