首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
This study addresses 3 questions regarding assessment center construct validity: (a) Are assessment center ratings best thought of as reflecting dimension constructs (dimension model), exercises (exercise model), or a combination? (b) To what extent do dimensions or exercises account for variance? (c) Which design characteristics increase dimension variance? To this end, a large set of multitrait-multimethod studies (N = 34) were analyzed, showing that assessment center ratings were best represented (i.e., in terms of fit and admissible solutions) by a model with correlated dimensions and exercises specified as correlated uniquenesses. In this model, dimension variance equals exercise variance. Significantly more dimension variance was found when fewer dimensions were used and when assessors were psychologists. Use of behavioral checklists, a lower dimension-exercise ratio, and similar exercises also increased dimension variance.  相似文献   

2.
This study presents a simultaneous examination of multiple evidential bases of the validity of assessment center (AC) ratings. In particular, we combine both construct-related and criterion-related validation strategies in the same sample to determine the relative importance of exercises and dimensions. We examine the underlying structure of ACs in terms of exercise and dimension factors while directly linking these factors to a work-related criterion (salary). Results from an AC (N = 753) showed that exercise factors not only explained more variance in AC ratings than dimension factors but also were more important in predicting salary. Dimension factors explained a smaller albeit significant portion of the variance in AC ratings and had lower validity for predicting salary. The implications of these findings for AC theory, practice, and research are discussed.  相似文献   

3.
A novel assessment center (AC) structure that models broad dimension factors, exercise factors, and a general performance factor is proposed and supported in 4 independent samples of AC ratings. Consistent with prior research, the variance attributable to dimension and exercise factors varied widely across ACs. To investigate the construct validity of these empirically supported components of AC ratings, the nomological network of broad dimensions, exercises, and general performance was examined. Results supported the criterion‐related validity of broad dimensions and exercises as predictors of effectiveness and success criteria as well as the incremental validity of broad dimensions beyond exercises and general performance. Finally, the relationships between individual differences and AC factors supported the construct validity of broad dimension factors and provide initial insight as to the meaning of exercise specific variance and general AC performance.  相似文献   

4.
Why Assessment Centers Do Not Work the Way They Are Supposed To   总被引:11,自引:10,他引:1  
Assessment centers (ACs) are often designed with the intent of measuring a number of dimensions as they are assessed in various exercises, but after 25 years of research, it is now clear that AC ratings that are completed at the end of each exercise (commonly known as postexercise dimension ratings) substantially reflect the effects of the exercises in which they were completed and not the dimensions they were designed to reflect. This is the crux of the long-standing "construct validity problem" for AC ratings. I review the existing research on AC construct validity and conclude that (a) contrary to previous notions, AC candidate behavior is inherently cross-situationally (i.e., cross-exercise) specific, not cross-situationally consistent as was once thought, (b) assessors rather accurately assess candidate behavior, and (c) these facts should be recognized in the redesign of ACs toward task- or role-based ACs and away from traditional dimension-based ACs.  相似文献   

5.
6.
Research indicates that assessment center (AC) ratings typically demonstrate poor construct validity; that is, they do not measure the intended dimensions of managerial performance (e.g., Sackett & Harris, 1988). The purpose of this study was to investigate the construct validity of dimension ratings from a developmental assessment center (N=102), using multitrait-multimethod analysis and factor analysis. The relationships between AC ratings, job performance ratings, and personality measures also were investigated. Results indicate that the AC ratings failed to demonstrate construct validity. The ratings did not show the expected relationships with the job performance and personality measures. Additionally, the factors underlying these ratings were found to be the AC exercises, rather than the managerial dimensions as expected. Potentially, this lack of construct validity of the dimension ratings is a serious problem for a developmental assessment center. There is little evidence that the managerial weaknesses identified by the AC are the dimensions that actually need to be improved on the job. Methods are discussed for improving the construct validity of AC ratings, for example, by decreasing the cognitive demands on the assessors.This study is based on a dissertation submitted to North Carolina State University. Portions of this paper were presented at the meeting of the Society for Industrial and Organizational Psychology in Montreal, Quebec, May, 1992. I am grateful to Paul Thayer, Bert Westbrook, James W. Cunningham, and Patrick Hauenstein for their contributions to this research. I also thank several anonymous reviewers for their comments on this article.  相似文献   

7.
This study examined the construct‐related validity of an assessment centre (AC) developed by a national distribution company for the selection and development of lower‐grade managers. In five locations throughout Britain, 487 individuals were observed on nine dimensions, each of which was measured through six distinct exercises. Multitrait‐multimethod analyses conducted to investigate the convergent and discriminant validity of the AC revealed strong exercise (“method”) effects. This finding was corroborated by an exploratory factor analysis showing that AC ratings clustered into factors according to exercises, rather than according to performance dimensions. A series of MANOVAs and chi‐squared tests demonstrated that neither the exercise ratings nor the selection decision were biased by sex, ethnicity, or training location, and a logistic regression determined which exercises had most impact on the final decision.  相似文献   

8.
ABSTRACT This study considered the validity of the personality structure based on the Five‐Factor Model using both self‐ and peer reports on twins' NEO‐PI‐R facets. Separating common from specific genetic variance in self‐ and peer reports, this study examined genetic substance of different trait levels and rater‐specific perspectives relating to personality judgments. Data of 919 twin pairs were analyzed using a multiple‐rater twin model to disentangle genetic and environmental effects on domain‐level trait, facet‐specific trait, and rater‐specific variance. About two thirds of both the domain‐level trait variance and the facet‐specific trait variance was attributable to genetic factors. This suggests that the more personality is measured accurately, the better these measures reflect the genetic structure. Specific variance in self‐ and peer reports also showed modest to substantial genetic influence. This may indicate not only genetically influenced self‐rater biases but also substance components specific for self‐ and peer raters' perspectives on traits actually measured.  相似文献   

9.
The authors reanalyzed assessment center (AC) multitrait-multimethod (MTMM) matrices containing correlations among postexercise dimension ratings (PEDRs) reported by F. Lievens and J. M. Conway (2001). Unlike F. Lievens and J. M. Conway, who used a correlated dimension-correlated uniqueness model, we used a different set of confirmatory-factor-analysis-based models (1-dimension-correlated Exercise and 1-dimension-correlated uniqueness models) to estimate dimension and exercise variance components in AC PEDRs. Results of reanalyses suggest that, consistent with previous narrative reviews, exercise variance components dominate over dimension variance components after all. Implications for AC construct validity and possible redirections of research on the validity of ACs are discussed.  相似文献   

10.
评价中心的评分维度和评分效果   总被引:3,自引:1,他引:2  
对近年来国内外关于评价中心的研究进行了比较系统的介绍。首先,文章讨论了评分维度的数目对于评分结果的影响,以及评价中心中的4个元维度;其次,介绍了评价中心中衡量评分效果的指标,并讨论了评分者培训的分类及其对评分效果的影响;第三,虽然评价中心具有良好的效标关联效度,但对于其结构效度的研究至今尚无统一结论。最后,文章对评价中心未来的研究趋势进行了探讨。  相似文献   

11.
It has been suggested that the large cognitive demands during the observation of assessment center (AC) participants can impair the quality of the assessors' ratings. An aspect that is especially relevant in this regard is the number of candidates that assessors have to observe simultaneously during group discussions, which are one of the most commonly used AC exercises. The present research evaluated potential impairments of the quality of the assessors' ratings (construct‐ and criterion‐related validity and rating accuracy) related to the number of to‐be‐observed candidates. Study 1 (N=1046) was a quasi‐experimental field study and Study 2 (N=71) was an experimental laboratory study. Both studies found significant impairments of assessors' rating quality when a larger in comparison to a lower number of candidates had to be observed simultaneously. These results suggest that assessors should not have to observe too many candidates at the same time during AC group discussions.  相似文献   

12.
考试评分缺失数据较为常见,如何有效利用现有数据进行统计分析是个关键性问题。在考试评分中,题目与评分者对试卷得分的影响不容忽视。根据概化理论原理,按考试评分规则推导出含有缺失数据双侧面交叉设计(p×i×r)方差分量估计公式,用Matlab7.0软件模拟多组缺失数据,验证此公式的有效性。结果发现:(1)推导出的公式较为可靠,估计缺失数据的方差分量偏差相对较小,即便数据缺失率达到50%以上,公式仍能对方差分量进行较为准确地估计;(2)题目数量对概化理论缺失数据方差分量的估计影响最大,评分者次之,当题目和评价者数量分别为6和5时,公式能够趋于稳定地估计;(3)学生数量对各方差分量的估计影响较小,无论是小规模考试还是大规模考试,概化理论估计缺失数据的多个方差分量结果相差不大。  相似文献   

13.
This meta-analysis tested a series of moderators of sex- and race-based subgroup differences using assessment center (AC) field data. We found that sex-based subgroup differences favoring female assessees were smaller among studies that reported: combining AC scores with other tests to compute overall assessment ratings, lower mean correlations between rating dimensions, using more than one assessor to rate assessees in exercises, and providing assessor training. In contrast, we found larger sex-based subgroup differences favoring female assessees among studies that reported: lower proportions of females in assessee pools, conducting a job analysis to design the AC, and using multiple observations of AC dimensions across exercises. We also observed a polynomial effect showing that subgroup differences most strongly favored female assessees in jobs with the highest and lowest rates of female incumbents. We found race-based subgroup differences favoring White assessees were smaller on less cognitively loaded rating dimensions and for jobs with lower rates of Black incumbents. Studies reporting greater overall methodological rigor also showed smaller subgroup differences favoring White assessees. Regarding specific rigor features, studies reporting use of highly qualified assessors and integrating dimension ratings from separate exercises into overall dimension scores showed significantly lower differences favoring White assessees.  相似文献   

14.
This study investigated leniency and similar‐to‐me bias as mechanisms underlying demographic subgroup differences among assessees in assessors’ initial dimension ratings from three assessment center (AC) simulation exercises used as part of high‐stakes promotional testing. It examined whether even small individual‐level effects can accumulate (i.e., “trickle‐up”) to produce larger subgroup‐level differences. Individual‐level analyses were conducted using cross‐classified multilevel modeling and conducted separately for each exercise. Results demonstrated weak evidence of leniency toward White assessees and similar‐to‐me bias among non‐White assessee–assessor pairs. Similar leniency was found toward female assessees, but no statistically significant effects were found for assessee or assessor gender or assessee–assessor gender similarity. Using traditional d effect size estimates, weak individual level assessee effects translated into small but consistent subgroup differences favoring White and female assessees. Generally small but less consistent subgroup differences indicated that non‐White and male assessors gave higher ratings. Moreover, analyses of overall promotion decisions indicate the absence of adverse impact. Findings from this AC provide some support for the “trickle‐up” effect, but the effect on subgroup differences is trivial. The results counter recent reviews of AC studies suggesting larger than previously assumed subgroup differences. Consequently, the findings demonstrate the importance of following established best practices when developing and implementing the AC method for selection purposes to minimize subgroup differences.  相似文献   

15.
The present study replicated and extended research concerning a recently suggested conceptual model of the underlying factors of dimension ratings in assessment centers (ACs) proposed by Hoffman, Melchers, Blair, Kleinmann, and Ladd that includes broad dimension factors, exercise factors, and a general performance factor. We evaluated the criterion-related validity of these different components and expanded their nomological network. Results showed that all components (i.e., broad dimensions, exercises, general performance) were significant predictors of training performance. Furthermore, broad dimensions showed incremental validity beyond exercises and general performance. Finally, relationships between the AC factors and individual difference constructs (e.g., Big Five, core self-evaluations, positive and negative affectivity) supported the construct-related validity of broad dimensions and provided further insights in the nature of the different AC components.  相似文献   

16.
The present study was designed to obtain validity estimates for a role-play test. Participants were 125 french Navy officers who were rated by a pool of professional assessors and psychologists. All the assessors received reccurring training sessions, focusing on the behavioral checklist, on rating errors, and on share frame of reference. The assessment procedure included role play exercise, cognitive ability scale (g factor) and personality scale (big five factors). First, exploratory factor analyses were conducted on the data gathered, and four factors were identified (authoritarianism, oral communication, consideration with others and frankness). In a nomological perspective, we also analysed the links between the exercises dimensions, personality inventorie and intelligence scale. The findings suggest that role play dimension, personality and intelligence seem to measure different thinks.  相似文献   

17.
The present study was concerned with investigating the nature of assessment center exercises. Bem and Funder's (1978) technique of classifying and comparing situations in behavioral terms was applied to the measurement of exercises in an assessment center with ratings reflecting exercise factors. Six assessors created templates for each of the four exercises. Intercorrelations between the mean template ratings suggested that the exercises were not viewed as similar situations. Moreover, some relationship was found between exercise similarity and performance consistency. It is proposed that these differences in exercise content may be responsible for the inability to find cross-situational consistencies in candidate behavior in a typical assessment center. Practical implications and extensions of this study are discussed.  相似文献   

18.
In performance appraisals, some assessors are substantially more lenient than others. Research on this effect in appraisals involving communication and interaction between raters and ratees after the performance evaluation has taken place indicates that it may be at least partly caused by individual differences in assessor personality. However, little is known about the impact or causes of rater severity versus leniency in situations in which there is little or no contact between raters and ratees after the performance evaluation. In Study 1 (N = 174) the strength of the severity–leniency effect in this ‘no‐contact’ context is estimated and found to be similar to that reported for ‘with‐contact’ appraisals. No evidence of an association between assessor personality and assessor severity (vs. leniency) is found in the ‘no‐contact’ context. In Study 2 (N = 54) there is no evidence of an association between the fluid cognitive ability of assessors and the severity of their ratings in a no‐contact context. It is concluded that the severity versus leniency effect probably has a considerable impact on performance ratings in ‘no‐contact’ appraisal settings, but that neither rater personality nor rater cognitive ability appear to play a significant role in this.  相似文献   

19.
Pi (π) and kappa (κ) statistics are widely used in the areas of psychiatry and psychological testing to compute the extent of agreement between raters on nominally scaled data. It is a fact that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explores the origin of these limitations, and introduces an alternative and more stable agreement coefficient referred to as the AC1 coefficient. Also proposed are new variance estimators for the multiple‐rater generalized π and AC1 statistics, whose validity does not depend upon the hypothesis of independence between raters. This is an improvement over existing alternative variances, which depend on the independence assumption. A Monte‐Carlo simulation study demonstrates the validity of these variance estimators for confidence interval construction, and confirms the value of AC1 as an improved alternative to existing inter‐rater reliability statistics.  相似文献   

20.
Recent Monte Carlo research has illustrated that the traditional method for assessing the construct-related validity of assessment center (AC) post-exercise dimension ratings (PEDRs), an application of confirmatory factor analysis (CFA) to a multitrait-multimethod matrix, produces inconsistent results [Lance, C. E., Woehr, D. J., & Meade, A. W. (2007). Case study: A Monte Carlo investigation of assessment center construct validity models. Organizational Research Methods, 10, 430-448]. To avoid this shortcoming, a variance partitioning procedure was applied to the examination of the PEDRs of 193 individuals. Overall, results indicated that the person, dimension, and person by dimension interaction effects together accounted for approximately 32% of the total variance in AC ratings. However, despite no apparent exercise effect, the person by exercise interaction accounted for approximately 28% of the total variance. Although these results are drawn from a single AC, they nevertheless provide general support for the overall functioning of ACs and encourage continued application of variance partitioning approaches to AC research. Implications for AC design and research are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号