首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

2.
相对于其它评价中心技术而言,在无领导小组讨论中考官因素对评分结果的影响尤为重要.本研究主要探讨无领导小组讨论中新手考官的工作记忆与人格对其评分有效性的影响.结果发现,首先,新手考官的评分者一致性较低,评分准确度较差.其次,工作记忆和人格的部分因素分别从不同方面影响新手考官的评分有效性,具体表现在:(1)利他性越强,新手考官评分总均值的准确性越高,且评分结果越宽松;(2)新手考官的决断性越强,对所有应聘者做出有效区分的准确性越高;(3)新手考官的沉稳性越高,对各维度的区分越有效;(4)注意转换和抑制能力对新手考官的晕轮效应及其在各个维度上进行区分的准确度有抑制作用.  相似文献   

3.
Inter-rater reliability and accuracy are measures of rater performance. Inter-rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter-rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert-generated criterion ratings and between raters using intraclass correlation (2,1). Inter-rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter-rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter-rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter-rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

4.
The purpose of this study was to approach the issue of rating ability by examining the influence of rater implicit theories and rater intelligence on rating outcomes. Using the inferential accuracy model (Jackson, 1972), raters were identified as either possessing a normative or idiosyncratic implicit theory of the occupation of college instructor. In a laboratory setting, 50 normative and 50 idiosyncratic raters judged the videotaped performance of either a good or poor lecturer. Results showed that (a) intelligence was positively related to rating accuracy and to possessing a normative implicit theory, (b) rater type moderated the relationship between intelligence and rating accuracy, and (c) controlling for intelligence, normative raters committed stronger halo effects than idiosyncratic raters. These results were discussed in relation to furthering the understanding of rating ability.  相似文献   

5.
This study investigates the effects of rater personality (Conscientiousness and Agreeableness), rating format (graphic rating scale vs. behavioral checklist), and the rating social context (face‐to‐face feedback vs. no face‐to‐face feedback) on rating elevation of performance ratings. As predicted, raters high on Agreeableness showed more elevated ratings than those low on Agreeableness when they expected to have the face‐to‐face feedback meeting. Furthermore, rating format moderated the relationship between Agreeableness and rating elevation, such that raters high on Agreeableness provided less elevated ratings when using the behavioral checklist than the graphic rating scale, whereas raters low on Agreeableness showed little difference in elevation across different rating formats. Results also suggest that the interactive effects of rater personality, rating format, and social context may depend on the performance level of the ratee. The implications of these findings will be discussed.  相似文献   

6.
创造力测评中的评分者效应(rater effects)是指在创造性测评过程中, 由于评分者参与而对测评结果造成的影响.评分者效应本质上源于评分者内在认知加工的不同, 具体体现在其评分结果的差异.本文首先概述了评分者认知的相关研究, 以及评分者,创作者,社会文化因素对测评的影响.其次在评分结果层面梳理了评分者一致性信度的指标及其局限, 以及测验概化理论和多面Rasch模型在量化,控制该效应中的应用.最后基于当前研究仍存在的问题, 指出了未来可能的研究方向, 包括深化评分者认知研究,整合不同层面评分者效应的研究, 以及拓展创造力测评方法和技术等.  相似文献   

7.
The present study updates Woehr and Huffcutt's (1994) rater training meta‐analysis and demonstrates that frame‐of‐reference (FOR) training is an effective method of improving rating accuracy. The current meta‐analysis includes over four times as many studies as included in the Woehr and Huffcutt meta‐analysis and also provides a snapshot of current rater training studies. The present meta‐analysis also extends the previous meta‐analysis by showing that not all operationalizations of accuracy are equally improved by FOR training; Borman's differential accuracy appears to be the most improved by FOR training, along with behavioural accuracy, which provides a snapshot into the cognitive processes of the raters. We also investigate the extent to which FOR training protocols differ, the implications of protocol differences, and if the criteria of interest to FOR researchers have changed over time.  相似文献   

8.
张赟  翁清雄 《心理科学进展》2018,26(6):1131-1140
多源评价在国外企业中的运用已日益成熟, 但在我国还停留在探索与发展阶段。基于已有的研究发现, 围绕评价过程、评价源及被评价者三方面对多源评价的特点及内在机制进行了探讨与分析。从评价过程看, 其评价目的具有多重性, 评价形式注重匿名性, 且评价结果的合理应用非常重要; 从评价源看, 不同评价源间的评价一致性较低, 且易造成晕轮效应和宽大效应; 从被评价者来看, 个体对多源评价结果的反应, 受到个性特征、反馈信号及自我-他人评价间差距等因素影响。研究也发现, 多源评价所带来的绩效改进结果具有不稳定性。基于此, 如何提高多源评价过程的有效性与准确性, 改善评价者对评价结果的反应, 以及如何对多源评价结果进行有效汇总等是未来值得研究的重要内容。  相似文献   

9.
Despite the popularity of frame‐of‐reference training (FORT), it is not clear how different structural elements of FORT work in concert to improve rating accuracy. Furthermore, past rater training studies have lacked rigorous control groups leading to low thresholds for showing improvements in rating accuracy due to FORT. The current study allowed for the isolation of components of rater training that increase rating accuracy when compared to a rigorously designed control group. Results indicated that repeated rendering of practice ratings improve rating accuracy and this practice effect was amplified by practice rating feedback. Although accuracy‐based training content improved interrater agreement, it did not contribute to improvements rating accuracy over and above the control group. We discuss the implications of the findings in relation to best practices for designing rater training programs.  相似文献   

10.
黎光明  蒋欢 《心理科学》2019,(3):731-738
包含评分者侧面的测验通常不符合任意一种概化理论设计,因此从概化理论的角度来看这类测验下的数据应属于缺失数据,而决定缺失结构的就是测验的评分方案。用R软件模拟出三种评分方案下的数据,并比较传统法、评价法和拆分法在各评分方案下的估计效果,结果表明:(1)传统法估计准确性较差;(2)评分者一致性较高时,适宜用评价法进行估计;(3)拆分法的估计结果最准确,仅在固定评分者评分方案下需注意评分者与考生数量之比,该比值小于等于0.0047 时估计结果较为准确。  相似文献   

11.
This study explores the importance of anticipated group discussion, the consensus decision rule, and rater motivation in determining how well rater teams identify ratee behaviors, i.e., behavioral accuracy. Results, based on 382 raters in 111 teams, suggest that the anticipation of group discussion can improve behavioral accuracy, but it appears that the benefits of discussion-only teams are limited to this anticipation effect. Furthermore, it also appears that rater motivation plays an important role in this type of team. Rater teams required to reach consensus, however, appear to show improved behavioral accuracy, regardless of whether raters can anticipate the consensus discussion and regardless of rater motivation levels. Implications, especially for assessment centers, are discussed.  相似文献   

12.
This study investigates the factors that motivate the preference of individuals, or ratees, participating in multi‐source assessment (MSA) processes for some raters over others. Two rater characteristics were assessed to attempt to identify these preferences: rater familiarity and the affect toward the rater. Two separate studies were conducted to assess the extent to which these characteristics are used. The extent to which the purpose of the appraisal (developmental vs. administrative) and the rating source (peer vs. subordinate) influenced the use of these characteristics was also investigated. Evidence from these two studies suggests that ratees selected their raters based on rater familiarity but not on affect. In addition, while the purpose of the appraisal did not influence selection patterns, the preference for peers was motivated by different factors than the preference for subordinates. The implications of these results for research and practice are discussed.  相似文献   

13.
This study investigated how personal cognitive style and training effect rating validity with two different rating tasks. Male undergraduate volunteers (n = 53) served as raters and rated videotaped lecturers. Using the Embedded Figures Test to measure cognitive style, two groups of raters were formed: those who tend to structure information presented (articulated) and those who do not (global). Half of each cognitive style received observational training designed to be congruent with the behavioral rating task. All raters completed two rating tasks: one requiring an evaluative judgment and one requiring a judgment of behavior frequency. It was hypothesized that with the evaluative rating task, cognitive style would be and training would not be a significant predictor of validity, because the training was not relevant to the task. It was also hypothesized that with the observational task training would improve rating validity (overcoming cognitive style), because the training was relevant to the rating task. Both hypotheses were supported.I wish to thank Dr. Kevin Murphy for the use of the videotapes.  相似文献   

14.
15.
When providing performance ratings, it is commonly assumed that raters agree more on rating items that are behaviorally based and observable than on items that are vague and less behaviorally based. This study empirically investigated the relationships between agreement among raters, raters' perceptions regarding their difficulty in providing ratings, and expert assessments of the behavioral observability of each item. The results, based on 611 raters in two studies conducted in different locations, suggest that contrary to common expectations, rater agreement can increase as raters' reported rating difficulty increases and as behavioral observability decreases. Explanations and implications are discussed.  相似文献   

16.
Several investigators have examined the relationship of a rater's cognitive complexity to accurate empathic prediction of a target's self-concept or behavior, with mixed results. The present study sought to clarify this relation by considering both the conceptual differentiation (functionally independent construction) and integration (ordination) of both rater and target as they bear on predictive accuracy at early and later stages of acquaintance. Two sets of ten subjects participated in weekly self-disclosure groups, and attempted to predict one another's self-ratings on personal constructs after four and eighteen weeks of structured dyadic interaction. Results suggested that (a) the conceptual structure of the rater was unrelated to predictive accuracy, (b) high differentiated/low integrated targets were less accurately predicted at Time 1, (c) raters generally became more accurate predictors over time, and (d) conceptual structure was related to predictive accuracy at early, but not advanced stages of relationship. These findings were interpreted within an expanded theoretical framework emphasizing the multidimensional assessment of cognitive complexity as well as the stage of acquaintance at which social prediction takes place.  相似文献   

17.
Frame-of-reference (FOR) rater training is one technique used to impart a theory of work performance to raters. In this study, the authors explored how raters' implicit performance theories may differ from a normative performance theory taught during training. The authors examined how raters' level and type of idiosyncrasy predicts their rating accuracy and found that rater idiosyncrasy negatively predicts rating accuracy. Moreover, although FOR training may improve rating accuracy even for trainees with lower performance theory idiosyncrasy, it may be more effective in improving errors of omission than commission. The discussion focuses on the roles of idiosyncrasy in FOR training and the implications of this research for future FOR research and practice.  相似文献   

18.
探讨了康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型(GR-MLFM)在包含被试及评分者层面预测变量(完整模型)下的返真性和适用性。结果表明:(1)GR-MLFM完整模型具有逻辑上和数理上的合理性,可用于主观题的评分情境,能较好地检测出评分者效应、影响因素及其影响程度;(2)在数学问题解决的评分实践中,评分员存在两种类型的评分倾向(宽松和严格效应),但绝大多数评分员的宽严度不明显;评分者的责任心可正向预测其严格程度,自信心可正向预测其宽松程度,而情绪稳定性和评分经验的预测作用不显著。  相似文献   

19.
绩效考核中宽大效应的成因及控制方法   总被引:1,自引:0,他引:1  
绩效考核中宽大效应对于组织的人力资源管理有极大的危害性。该文从考核情境因素、考核工具因素以及考核者因素三方面分析了绩效考核中宽大效应的成因,考核情境因素主要包括组织文化、考核目的、考核的匿名性;考核工具因素主要指考核指标的清晰与结构化;考核者因素包括考核者的动机、认知过程、情感、情绪以及稳定的人格特质。同时,根据对成因的分析提出了对绩效考核中宽大效应的控制方法。最后,指出了以往研究的不足以及未来需要深入研究的几个问题  相似文献   

20.
Hoyt WT 《心理学方法》2007,12(4):467-475
Rater biases are of interest to behavior genetic researchers, who often use ratings data as a basis for studying heritability. Inclusion of multiple raters for each sibling pair (M. Bartels, D. I. Boomsma, J. J. Hudziak, T. C. E. M. van Beijsterveldt, & E. J. C. G. van den Oord, see record 2007-18729-006) is a promising strategy for controlling bias variance and may yield information about sources of bias in heritability studies. D. A. Kenny's (2004) PERSON model is presented as a framework for understanding determinants of rating reliability and validity. Empirical findings on rater bias in other contexts provide a starting point for addressing the impact of rater-unique perceptions in heritability studies. However, heritability studies use distinctive rating designs that may accentuate some sources of bias, such as rater communication and contrast effects, which warrant further study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号