首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 5 毫秒
1.

Purpose

The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.

Design/Methodology/Approach

Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS® MSPR instrument.

Findings

Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.

Implications

The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.

Originality/Value

The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.  相似文献   

2.
国家公务员结构化面试中评委偏差的IRT分析   总被引:7,自引:1,他引:6  
孙晓敏  张厚粲 《心理学报》2006,38(4):614-625
使用项目反应理论(IRT)中的多面Rasch模型,对两组共12名评委在国家公务员结构化面试中的评委偏差进行了分析。提出并验证了两种评委偏差:评委之间在宽严程度上的差异和评委自身的一致性问题。结果发现:不同评委之间在宽严程度上差异显著,且不同评委评定行为的跨考生、跨维度、跨性别、跨时间的自身一致性也存在差异。研究表明,这种进入到评委个体层次的分析突破了经典测量理论(CTT)定位于评委群体进行分析的局限,针对每位评委的偏差行为提供了详细具体的诊断信息,从而为评委的针对性培训和评委库的建立提供了现代测量学的新方法  相似文献   

3.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

4.
Differential rater functioning (DRF) occurs when raters show evidence of exercising differential severity or leniency when scoring examinees within different subgroups. Previous studies of DRF have examined rater bias using manifest variables (e.g., use of covariates) to determine the subgroups. These manifest variables include gender and the ethnicity of the examinee. For example, a rater may score males more severely. Ideally, each rater’s severity should be invariant across subgroups. This study examines DRF in the context of latent subgroups that classify possible sources of DRF based on raters’ scoring behavior rather than manifest factors. An extension of the latent class signal detection theory (LC-SDT) model for identifying DRF is proposed and examined using real-world data and simulations. Results from real-world data show that the signal detection approach leads to an effective method to identify latent DRF. Simulations with varying sample sizes and conditions of rater precision were shown to recover parameters at an adequate level, supporting its use to identify latent DRF in large-scale data. These findings suggest that the DRF extension of the LC-SDT can be a useful model to examine characteristics of raters and add information that can aid rater training.  相似文献   

5.
探讨了康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型(GR-MLFM)在包含被试及评分者层面预测变量(完整模型)下的返真性和适用性。结果表明:(1)GR-MLFM完整模型具有逻辑上和数理上的合理性,可用于主观题的评分情境,能较好地检测出评分者效应、影响因素及其影响程度;(2)在数学问题解决的评分实践中,评分员存在两种类型的评分倾向(宽松和严格效应),但绝大多数评分员的宽严度不明显;评分者的责任心可正向预测其严格程度,自信心可正向预测其宽松程度,而情绪稳定性和评分经验的预测作用不显著。  相似文献   

6.
7.
采用多侧面Rasch模型对28位评委在托幼机构教育质量评价中的评委偏差进行了分析。分析结果显示:28名评委评分宽严度差异显著;3名评委内部一致性较差,其余25名评委内部一致性较稳定;评委与评价班级的交互作用不显著,与评价项目的交互作用显著。研究结果表明MFRM可以对托幼机构教育质量评价的评委偏差进行个体层面的具体分析,从项目反应理论的视角为托幼机构教育质量评价的评委针对性培训、评估评委的合格性从而建立合格评委库等提供现代教育、心理测量学依据。  相似文献   

8.
相对于其它评价中心技术而言,在无领导小组讨论中考官因素对评分结果的影响尤为重要.本研究主要探讨无领导小组讨论中新手考官的工作记忆与人格对其评分有效性的影响.结果发现,首先,新手考官的评分者一致性较低,评分准确度较差.其次,工作记忆和人格的部分因素分别从不同方面影响新手考官的评分有效性,具体表现在:(1)利他性越强,新手考官评分总均值的准确性越高,且评分结果越宽松;(2)新手考官的决断性越强,对所有应聘者做出有效区分的准确性越高;(3)新手考官的沉稳性越高,对各维度的区分越有效;(4)注意转换和抑制能力对新手考官的晕轮效应及其在各个维度上进行区分的准确度有抑制作用.  相似文献   

9.
The purpose of this study was to approach the issue of rating ability by examining the influence of rater implicit theories and rater intelligence on rating outcomes. Using the inferential accuracy model (Jackson, 1972), raters were identified as either possessing a normative or idiosyncratic implicit theory of the occupation of college instructor. In a laboratory setting, 50 normative and 50 idiosyncratic raters judged the videotaped performance of either a good or poor lecturer. Results showed that (a) intelligence was positively related to rating accuracy and to possessing a normative implicit theory, (b) rater type moderated the relationship between intelligence and rating accuracy, and (c) controlling for intelligence, normative raters committed stronger halo effects than idiosyncratic raters. These results were discussed in relation to furthering the understanding of rating ability.  相似文献   

10.
The present study examined the moderating effect of rater personality – extroversion and sensitivity to others – on the relations between selection interview ratings and measures of candidate self‐monitoring (SM) and social anxiety (SA). In a real‐life military selection procedure setting in which 445 candidates and 93 raters participated, rater extroversion moderated the relation between candidate SM and selection interview ratings so that this relation was negative for raters low on extroversion and positive for raters high on extroversion. Rater extroversion was also found to moderate the negative relation between candidate SA and selection interview ratings. No support was found for the moderating effect of rater sensitivity to others. An explanation of the moderating effect of rater extroversion based on the assumption that extroversion is negatively related to critical interpersonal sensitivity was suggested.  相似文献   

11.
This study explores the importance of anticipated group discussion, the consensus decision rule, and rater motivation in determining how well rater teams identify ratee behaviors, i.e., behavioral accuracy. Results, based on 382 raters in 111 teams, suggest that the anticipation of group discussion can improve behavioral accuracy, but it appears that the benefits of discussion-only teams are limited to this anticipation effect. Furthermore, it also appears that rater motivation plays an important role in this type of team. Rater teams required to reach consensus, however, appear to show improved behavioral accuracy, regardless of whether raters can anticipate the consensus discussion and regardless of rater motivation levels. Implications, especially for assessment centers, are discussed.  相似文献   

12.
张赟  翁清雄 《心理科学进展》2018,26(6):1131-1140
多源评价在国外企业中的运用已日益成熟, 但在我国还停留在探索与发展阶段。基于已有的研究发现, 围绕评价过程、评价源及被评价者三方面对多源评价的特点及内在机制进行了探讨与分析。从评价过程看, 其评价目的具有多重性, 评价形式注重匿名性, 且评价结果的合理应用非常重要; 从评价源看, 不同评价源间的评价一致性较低, 且易造成晕轮效应和宽大效应; 从被评价者来看, 个体对多源评价结果的反应, 受到个性特征、反馈信号及自我-他人评价间差距等因素影响。研究也发现, 多源评价所带来的绩效改进结果具有不稳定性。基于此, 如何提高多源评价过程的有效性与准确性, 改善评价者对评价结果的反应, 以及如何对多源评价结果进行有效汇总等是未来值得研究的重要内容。  相似文献   

13.
A program is described for computing interrater reliability by averaging, for each rater, the correlations between one rater’s ratings and every other rater’s ratings. For situations in which raters rate more than one ratee, raters’ reliabilities can be computed for either each item or each ratee. The program reads data from a text file and puts the reliability coefficients in a text file. The standard Macintosh interface is implemented. The Quick-BASIC program is distributed both as a listing and in compiled form; it can be run with advantage with math coprocessors.  相似文献   

14.
The standardization of ADHD ratings in adults is important given their differing symptom presentation. The authors investigated the agreement and reliability of rater standardization in a large-scale trial of atomoxetine in adults with ADHD. Training of 91 raters for the investigator-administered ADHD Rating Scale (ADHDRS-IV-Inv) occurred prior to initiation of a large, 31-site atomoxetine trial. Agreement between raters on total scores was established in two ways: (a) by Kappa coefficient (rater agreement for each item with the percentage of raters that had identical item-by-item scores) and (b) intraclass correlation coefficients (reliability). For the ADHDRS-IV-Inv, rater agreement was moderate, and reliability, as measured by Cronbach's alpha, was substantial. The data indicate that clinicians can be trained to reliably evaluate ADHD in adults using the ADHDRS-IV-Inv.  相似文献   

15.
对于评定耗时较长的测验来说,时间因素对评分精确性的影响不容忽视,因此,评分者漂移方面的研究备受关注。研究基于康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型建构出可用于检测评分者漂移的等级反应多水平评分者漂移模型,并通过模拟研究对模型性能进行验证。结果表明:模型能够精确估计项目和能力参数;且与固定效应模型相比,评分者随机效应模型能更有效地检测出评分者漂移效应,随机效应模型的有效性和稳定性更佳。  相似文献   

16.
The possibility for age discrimination and stereotypes to affect performance evaluations is rising. Although careful evaluations might be expected from conscientious raters, little is known about whether they might show more or less bias towards certain age groups. Therefore, in our study using a time-lagged design, we investigated the effects of rater conscientiousness on the performance evaluations of younger and older actual co-worker (= 242). We found that raters who were more conscientious provided higher ratings for older workers than for younger workers on task performance and organizational citizenship behaviours. Specifically, we tested the model of mediated moderation, in which the relation between rater conscientiousness and ratee age predicts ratee-perceived conscientiousness, which in turn predicts performance ratings. The model was significant for older ratees, but not for younger ratees. We discuss our results in terms of the “similar to me” effects and implications for organizational practices.  相似文献   

17.
Raters who pursue different goals give different ratings   总被引:5,自引:0,他引:5  
J. N. Cleveland and K. R. Murphy (1992) suggested that phenomena such as rater errors and interrater disagreements could be understood in terms of differences in the goals pursued by various raters. We measured 19 rating goals of students at the beginning of a semester, grouped them into scales, and correlated these with teacher evaluations collected at the end of the semester. We found significant multiple correlations, both within classes and in an analysis of the pooled sample (adjusting for instructor mean differences, incremental R2 =.08). Measures of rating goals obtained after raters had observed a significant proportion of ratee performance accounted for variance (incremental R2 =.07) not accounted for by measures of goals obtained at the beginning of the semester.  相似文献   

18.
创造力测评中的评分者效应(rater effects)是指在创造性测评过程中, 由于评分者参与而对测评结果造成的影响.评分者效应本质上源于评分者内在认知加工的不同, 具体体现在其评分结果的差异.本文首先概述了评分者认知的相关研究, 以及评分者,创作者,社会文化因素对测评的影响.其次在评分结果层面梳理了评分者一致性信度的指标及其局限, 以及测验概化理论和多面Rasch模型在量化,控制该效应中的应用.最后基于当前研究仍存在的问题, 指出了未来可能的研究方向, 包括深化评分者认知研究,整合不同层面评分者效应的研究, 以及拓展创造力测评方法和技术等.  相似文献   

19.
Data collected at two law enforcement agencies were used to address three specific issues concerning the development and implementation of frame-of-reference rater training. First, the prototype-anchored rating system was presented as a comprehensive method for generating an appropriate frame of reference in an organizational setting. Second, sensitivity and threshold analyses were used to demonstrate a method for identifying idiosyncratic raters (i.e., raters deviating from the appropriate frame of reference) in the rater population. Finally, areas of performance where supervisors and subordinates were likely to disagree on the frame of reference were identified. Concerning this latter issue, analyses indicated supervisors viewed poor-performance incidents more severely than did patrol officers on several dimensions of performance. To a lesser degree, supervisors and patrol officers also differed on their perceptions of the importance of poor-performance incidents. The implications of these findings are discussed in relation to the development and implementation of frame-of-reference rater training.  相似文献   

20.
The effects of rater and ratee race on performance ratings of managers were examined. Ratings were obtained from peers, subordinates and bosses as part of a multirater, developmental feedback program for managers. Two data sets were created for purposes of this study. The between-subjects data set consisted of ratings from over 20,000 bosses, over 50,000 peers, and over 40,000 subordinates. The repeated measures data set was substantially smaller because it included only those Black and White managers who were rated by both a Black and White rater from each of the three perspectives. Results for rater race indicated that Black raters from all perspectives (peers, subordinates, and bosses) assigned more favorable ratings to ratees of their own race. Results for White raters differed according to the particular rating source. White bosses assigned more favorable ratings to ratees of their own race, but White subordinates did not. White peers assigned more favorable ratings to Whites in the repeated measures analysis, but not in the between-subjects analysis. Results for ratee race indicated that both White and Black managers received higher ratings from Black raters than from White raters, and the effect was more pronounced for ratings assigned to Black managers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号