首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Recent research has questioned the importance of rater perspective effects on multisource performance ratings (MSPRs). Although making a valuable contribution, we hypothesize that this research has obscured evidence for systematic rater source effects as a result of misspecified models of the structure of multisource performance ratings and inappropriate analytic methods. Accordingly, this study provides a reexamination of the impact of rater source on multisource performance ratings by presenting a set of confirmatory factor analyses of two large samples of multisource performance rating data in which source effects are modeled in the form of second-order factors. Hierarchical confirmatory factor analysis of both samples revealed that the structure of multisource performance ratings can be characterized by general performance, dimensional performance, idiosyncratic rater, and source factors, and that source factors explain (much) more variance in multisource performance ratings whereas general performance explains (much) less variance than was previously believed. These results reinforce the value of collecting performance data from raters occupying different organizational levels and have important implications for research and practice.  相似文献   

2.
This study quantified the effects of 5 factors postulated to influence performance ratings: the ratee's general level of performance, the ratee's performance on a specific dimension, the rater's idiosyncratic rating tendencies, the rater's organizational perspective, and random measurement error. Two large data sets, consisting of managers (n = 2,350 and n = 2,142) who received developmental ratings on 3 performance dimensions from 7 raters (2 bosses, 2 peers, 2 subordinates, and self) were used. Results indicated that idiosyncratic rater effects (62% and 53%) accounted for over half of the rating variance in both data sets. The combined effects of general and dimensional ratee performance (21% and 25%) were less than half the size of the idiosyncratic rater effects. Small perspective-related effects were found in boss and subordinate ratings but not in peer ratings. Average random error effects in the 2 data sets were 11% and 18%.  相似文献   

3.

Purpose

The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.

Design/Methodology/Approach

Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS® MSPR instrument.

Findings

Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.

Implications

The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.

Originality/Value

The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.  相似文献   

4.
考试评分缺失数据较为常见,如何有效利用现有数据进行统计分析是个关键性问题。在考试评分中,题目与评分者对试卷得分的影响不容忽视。根据概化理论原理,按考试评分规则推导出含有缺失数据双侧面交叉设计(p×i×r)方差分量估计公式,用Matlab7.0软件模拟多组缺失数据,验证此公式的有效性。结果发现:(1)推导出的公式较为可靠,估计缺失数据的方差分量偏差相对较小,即便数据缺失率达到50%以上,公式仍能对方差分量进行较为准确地估计;(2)题目数量对概化理论缺失数据方差分量的估计影响最大,评分者次之,当题目和评价者数量分别为6和5时,公式能够趋于稳定地估计;(3)学生数量对各方差分量的估计影响较小,无论是小规模考试还是大规模考试,概化理论估计缺失数据的多个方差分量结果相差不大。  相似文献   

5.
Ninety subjects viewed one of three sets of videotapes which presented simulated work performances by five ratees. Subjects viewed videotapes where the true intercorrelation among the three job components was either high, moderate, or low. Half of the subjects completed ratings immediately after viewing all five ratees and again, 24 h later. The remaining subjects completed ratings only once, 24 h after viewing the videotapes. Rater intelligence was assessed via the Wesman Personnel Classification Test. Performance ratings for each ratee on each of the three job components were completed using a magnitude estimation scale. Subjects' ratings were compared to ratee true scores (based on objective worker output) to obtain four components of rater accuracy: elevation, differential elevation, stereotype accuracy, and differential accuracy as well as a measure of overall accuracy. Results indicated that subjects' ratings were more accurate with respect to overall accuracy and differential accuracy when the true intercorrelation among job components was high rather than low. Rater intelligence was significantly correlated with stereotype accuracy. In addition, rater intelligence was significantly related to overall accuracy, differential elevation, and elevation, but only when the true intercorrelation among job components was moderate or high. Also, there was a curvilinear component to the relationship between intelligence and both differential accuracy and stereotype accuracy such that the most intelligent raters tended to be less accurate than more moderately intelligent raters who were, in turn, more accurate than the least intelligent raters. Finally, subjects' immediate ratings were not more accurate than ratings provided by subjects who completed only delayed ratings. However, repeated measures analyses found that for subjects who completed both immediate and delayed ratings, delayed ratings were less accurate than immediate ratings with respect to overall accuracy, differential accuracy, and differential elevation.  相似文献   

6.
Hoyt WT 《心理学方法》2007,12(4):467-475
Rater biases are of interest to behavior genetic researchers, who often use ratings data as a basis for studying heritability. Inclusion of multiple raters for each sibling pair (M. Bartels, D. I. Boomsma, J. J. Hudziak, T. C. E. M. van Beijsterveldt, & E. J. C. G. van den Oord, see record 2007-18729-006) is a promising strategy for controlling bias variance and may yield information about sources of bias in heritability studies. D. A. Kenny's (2004) PERSON model is presented as a framework for understanding determinants of rating reliability and validity. Empirical findings on rater bias in other contexts provide a starting point for addressing the impact of rater-unique perceptions in heritability studies. However, heritability studies use distinctive rating designs that may accentuate some sources of bias, such as rater communication and contrast effects, which warrant further study.  相似文献   

7.
We tested the effects of rater agreeableness on the rating of others’ poor performance in performance appraisal (PA). We also examined the interactions between rater agreeableness and two aspects of the rating context: ratee self‐ratings and the prospect of future collaboration with the ratee. Participants (n= 230) were allocated to one of six experimental groups (a 3 × 2 between‐groups design) or a control group (n= 20). Participants received accurate, low‐deviated, or high‐deviated self‐ratings from the ratee. Half were notified they would collaborate with the ratee in a future task. High rater agreeableness, positive deviations in self‐rating, and the prospect of future collaboration were all independent predictors of higher PA ratings. The interactions between rater agreeableness and rating context were very small. We argue that conflict avoidance is an important motivation in the PA process.  相似文献   

8.
Using a field sample of peers and subordinates, the current study employed generalizability theory to estimate sources of systematic variability associated with both developmental and administrative ratings (variance due to items, raters, etc.) and then used these values to estimate the dependability (i.e., reliability) of the performance ratings under various conditions. Results indicated that the combined rater and rater-by-ratee interaction effect and the residual effect were substantially larger than the person effect (i.e., object of measurement) for both rater sources across both purpose conditions. For subordinates, the person effect accounted for a significantly greater percentage of total variance in developmental ratings than in administrative ratings; however, no differences were observed for peer ratings as a function of rating purpose. These results suggest that subordinate ratings are of significantly better quality when made for developmental than for administrative purposes, but the same is not true for peer ratings.  相似文献   

9.

Purpose

The purpose of this study was to take an inductive approach in examining the extent to which organizational contexts represent significant sources of variance in supervisor performance ratings, and to explore various factors that may explain contextual rating variability.

Design/Methodology/Approach

Using archival field performance rating data from a large state law enforcement organization, we used a multilevel modeling approach to partition the variance in ratings due to ratees, raters, as well as rating contexts.

Findings

Results suggest that much of what may often be interpreted as idiosyncratic rater variance, may actually reflect systematic rating variability across contexts. In addition, performance-related and non-performance factors including contextual rating tendencies accounted for significant rating variability.

Implications

Supervisor ratings represent the most common approach for measuring job performance, and understanding the nature and sources of rating variability is important for research and practice. Given the many uses of performance rating data, our findings suggest that continuing to identify contextual sources of variability is particularly important for addressing criterion problems, and improving ratings as a form of performance measurement.

Originality/Value

Numerous performance appraisal models suggest the importance of context; however, previous research had not partitioned the variance in supervisor ratings due to omnibus context effects in organizational settings. The use of a multilevel modeling approach allowed the examination of contextual influences, while controlling for ratee and rater characteristics.
  相似文献   

10.
Interrater correlations are widely interpreted as estimates of the reliability of supervisory performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theory to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations.  相似文献   

11.
The present study examined the moderating effect of rater personality – extroversion and sensitivity to others – on the relations between selection interview ratings and measures of candidate self‐monitoring (SM) and social anxiety (SA). In a real‐life military selection procedure setting in which 445 candidates and 93 raters participated, rater extroversion moderated the relation between candidate SM and selection interview ratings so that this relation was negative for raters low on extroversion and positive for raters high on extroversion. Rater extroversion was also found to moderate the negative relation between candidate SA and selection interview ratings. No support was found for the moderating effect of rater sensitivity to others. An explanation of the moderating effect of rater extroversion based on the assumption that extroversion is negatively related to critical interpersonal sensitivity was suggested.  相似文献   

12.
Research studies in psychology and education often seek to detect changes or growth in an outcome over a duration of time. This research provides a solution to those interested in estimating latent traits from psychological measures that rely on human raters. Rater effects potentially degrade the quality of scores in constructed response and performance assessments. We develop an extension of the hierarchical rater model (HRM), which yields estimates of latent traits that have been corrected for individual rater bias and variability, for ratings that come from longitudinal designs. The parameterization, called the longitudinal HRM (L-HRM), includes an autoregressive time series process to permit serial dependence between latent traits at adjacent timepoints, as well as a parameter for overall growth. We evaluate and demonstrate the feasibility and performance of the L-HRM using simulation studies. Parameter recovery results reveal predictable amounts and patterns of bias and error for most parameters across conditions. An application to ratings from a study of character strength demonstrates the model. We discuss limitations and future research directions to improve the L-HRM.  相似文献   

13.
黎光明  蒋欢 《心理科学》2019,(3):731-738
包含评分者侧面的测验通常不符合任意一种概化理论设计,因此从概化理论的角度来看这类测验下的数据应属于缺失数据,而决定缺失结构的就是测验的评分方案。用R软件模拟出三种评分方案下的数据,并比较传统法、评价法和拆分法在各评分方案下的估计效果,结果表明:(1)传统法估计准确性较差;(2)评分者一致性较高时,适宜用评价法进行估计;(3)拆分法的估计结果最准确,仅在固定评分者评分方案下需注意评分者与考生数量之比,该比值小于等于0.0047 时估计结果较为准确。  相似文献   

14.
The social skills of 20 second- and sixth-grade students were assessed by 20 trained raters using the Social Skills Test for Children (SST-C). Rater and child characteristics were examined to determine whether differences in social skills ratings were due to the race of the rater or the race of the children being rated or due to the interactive effects of these characteristics, which would suggest racial bias in the ratings procedure. The results showed that the race of the rater did affect some behavioral observations. Black raters gave higher scores than white raters on four behavioral categories: response latency, appropriate assertion, effective assertion, and smiling. White raters gave higher scores for head position and gestures. The results of this study replicated earlier findings of significant differences in social skills ratings due to the race and age of the child being rated. The results also showed modest racial bias effects in that black and white raters scored black and white children differentially on two behavioral categories: overall skill ratings and smiling. These results suggested that most behavioral categories of the SST-C were not systematically affected by racial bias. However, the most subjective rating, overall skill, did evidence racial bias effects. This finding is consistent with previous data showing that subjective ratings may be most affected by racial bias.  相似文献   

15.
Rater bias in the EASI temperament scales: a twin study   总被引:1,自引:0,他引:1  
Under trait theory, ratings may be modeled as a function of the temperament of the child and the bias of the rater. Two linear structural equation models are described, one for mutual self- and partner ratings, and one for multiple ratings of related individuals. Application of the first model to EASI temperament data collected from spouses rating each other shows moderate agreement between raters and little rating bias. Spouse pairs agree moderately when rating their twin children, but there is significantly rater bias, with greater bias for monozygotic than for dizygotic twins. MLE's of heritability are approximately .5 for all temperament scales with no common environmental variance. Results are discussed with reference to trait validity, the person-situation debate, halo effects, and stereotyping. Questionnaire development using ratings on family members permits increased rater agreement and reduced rater bias.  相似文献   

16.
创造力测评中的评分者效应(rater effects)是指在创造性测评过程中, 由于评分者参与而对测评结果造成的影响.评分者效应本质上源于评分者内在认知加工的不同, 具体体现在其评分结果的差异.本文首先概述了评分者认知的相关研究, 以及评分者,创作者,社会文化因素对测评的影响.其次在评分结果层面梳理了评分者一致性信度的指标及其局限, 以及测验概化理论和多面Rasch模型在量化,控制该效应中的应用.最后基于当前研究仍存在的问题, 指出了未来可能的研究方向, 包括深化评分者认知研究,整合不同层面评分者效应的研究, 以及拓展创造力测评方法和技术等.  相似文献   

17.
We examined Work Behavior to knowledge, skill, or ability linkage ratings for 9 jobs to determine the degree to which differences in the ratings were due to rater type. We collected ratings from incumbents and 2 types of job analysts: project job analysts (analysts knowledgeable of the job) and nonproject job analysts (analysts with very little or no knowledge of the job). In our analyses of the data, we calculated means, standard deviations, effect sizes, and correlations for each rater type, as well as compared the reliability of the ratings. We also estimated variance components for each job by conducting generalizability analyses ( Brennan, 1983 ; Shavelson, Webb, & Rowley, 1989 ). Our findings indicate that the level of linkage ratings is similar across rater types, that it is important to obtain ratings from multiple raters regardless of rater type, and that ratings from job analysts may be more reliable than those of incumbents.  相似文献   

18.
Rater bias is a substantial source of error in psychological research. Bias distorts observed effect sizes beyond the expected level of attenuation due to intrarater error, and the impact of bias is not accurately estimated using conventional methods of correction for attenuation. Using a model based on multivariate generalizability theory, this article illustrates how bias affects research results. The model identifies 4 types of bias that may affect findings in research using observer ratings, including the biases traditionally termed leniency and halo errors. The impact of bias depends on which of 4 classes of rating design is used, and formulas are derived for correcting observed effect sizes for attenuation (due to bias variance) and inflation (due to bias covariance) in each of these classes. The rater bias model suggests procedures for researchers seeking to minimize adverse impact of bias on study findings.  相似文献   

19.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

20.
The current study used a Social Relations Model to analyze self and peer ratings to explore the dynamics of team member perceptions and performance ratings. The results from 29 organizational teams who completed performance ratings of themselves and team members indicated that the most rating variance was attributed to the relationship component, followed by the ratee component, followed by the rater component. Among other findings, the results indicated that self‐ratings were related to how one rates, and is rated by, others; that there were high levels of reciprocity between peers for dimensions that were interpersonal in nature; and that raters tended to evaluate others within, but not necessarily across, dimensions similarly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号