首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Variations in ratings of externalizing and internalizing symptoms may contain a trait (i.e., shared view) component when behavioral symptoms that generalize across context are perceived and an individual view component when they are misperceived or when each informant has access to different symptoms. Using a LISREL model, we estimated the trait and the informant-specific, individual view components in parental ratings of externalizing and internalizing symptoms of adolescent siblings. The model demonstrated that mothers' and fathers' ratings contained a substantial individual view component (from 21% to 50% of total rating variance, depending on rater and trait). Except for fathers' ratings of internalizing symptoms (13%), parental ratings also contained a substantial trait component (42% to 58%). Mother's, father's, and child's ratings may be averaged to estimate a trait of externalizing. To estimate an internalizing trait, it may be best to combine just the mother's rating with the child's self-rating.  相似文献   

2.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

3.
Despite the popularity of frame‐of‐reference training (FORT), it is not clear how different structural elements of FORT work in concert to improve rating accuracy. Furthermore, past rater training studies have lacked rigorous control groups leading to low thresholds for showing improvements in rating accuracy due to FORT. The current study allowed for the isolation of components of rater training that increase rating accuracy when compared to a rigorously designed control group. Results indicated that repeated rendering of practice ratings improve rating accuracy and this practice effect was amplified by practice rating feedback. Although accuracy‐based training content improved interrater agreement, it did not contribute to improvements rating accuracy over and above the control group. We discuss the implications of the findings in relation to best practices for designing rater training programs.  相似文献   

4.
When providing performance ratings, it is commonly assumed that raters agree more on rating items that are behaviorally based and observable than on items that are vague and less behaviorally based. This study empirically investigated the relationships between agreement among raters, raters' perceptions regarding their difficulty in providing ratings, and expert assessments of the behavioral observability of each item. The results, based on 611 raters in two studies conducted in different locations, suggest that contrary to common expectations, rater agreement can increase as raters' reported rating difficulty increases and as behavioral observability decreases. Explanations and implications are discussed.  相似文献   

5.
We tested the effects of rater agreeableness on the rating of others’ poor performance in performance appraisal (PA). We also examined the interactions between rater agreeableness and two aspects of the rating context: ratee self‐ratings and the prospect of future collaboration with the ratee. Participants (n= 230) were allocated to one of six experimental groups (a 3 × 2 between‐groups design) or a control group (n= 20). Participants received accurate, low‐deviated, or high‐deviated self‐ratings from the ratee. Half were notified they would collaborate with the ratee in a future task. High rater agreeableness, positive deviations in self‐rating, and the prospect of future collaboration were all independent predictors of higher PA ratings. The interactions between rater agreeableness and rating context were very small. We argue that conflict avoidance is an important motivation in the PA process.  相似文献   

6.
This study extends multisource feedback research by assessing the effects of rater source and raters' cultural value orientations on rating bias (leniency and halo). Using a motivational perspective of performance appraisal, the authors posit that subordinate raters followed by peers will exhibit more rating bias than superiors. More important, given that multisource feedback systems were premised on low power distance and individualistic cultural assumptions, the authors expect raters' power distance and individualism-collectivism orientations to moderate the effects of rater source on rating bias. Hierarchical linear modeling on data collected from 1,447 superiors, peers, and subordinates who provided developmental feedback to 172 military officers show that (a) subordinates exhibit the most rating leniency, followed by peers and superiors; (b) subordinates demonstrate more halo than superiors and peers, whereas superiors and peers do not differ; (c) the effects of power distance on leniency and halo are strongest for subordinates than for peers and superiors; (d) the effects of collectivism on leniency were stronger for subordinates and peers than for superiors; effects on halo were stronger for subordinates than superiors, but these effects did not differ for subordinates and peers. The present findings highlight the role of raters' cultural values in multisource feedback ratings.  相似文献   

7.
One of the central objectives of inclusive education, and education in general, is not only to support every students' academic learning, but also their social and emotional development. It therefore is important to identify difficulties in a child's socio-emotional development at school. The current study investigates students' emotional inclusion and social inclusion, as well as students' academic self-concept from four different perspectives using the Perceptions of Inclusion Questionnaire (PIQ). In particular, we analyzed the degree of agreement between teacher, mother, and father ratings with students' self-reports. Moreover, we tested if students' gender and special educational needs (SEN) are predictors for possible bias in parent and teacher reports. Survey participants included 721 Austrian, Grade 4 students from 48 classes. In addition, data from 46 teachers, 466 mother reports, and 375 father reports were included. We assessed the consistency (i.e., agreement) between the different raters by means of multitrait-multimethod analyses, or more precisely, a correlated trait–correlated method minus one (CT-C[M-1]) model. Results of the CT-C(M-1) analyses indicated a rather strong rater bias (i.e., method effects) for all three dimensions of inclusion. However, the consistency for academic self-concept was higher than for emotional and social inclusion. Furthermore, gender and SEN status affected rater bias, particularly for teacher reports. Results suggested that it matters who reports students' emotional inclusion, social inclusion, and academic self-concept, which has methodological and practical implications.  相似文献   

8.
Self-report data on Extraversion (E) and Neuroticism (N), together with ratings by the co-twin, were obtained from a sample of 826 adult female twin pairs ascertained through a population-based twin register. Data were analyzed using a model that allowed for the contributions to personality ratings of the rater's personality (rater bias) as well as of the personality of the person being rated. For E, but not for N, significant rater bias was found, with extraverted respondents tending to underestimate, and introverted respondents tending to overestimate, the Extraversion of their co-twins. Good agreement between self-reports and ratings by the respondent's co-twin was found for both E and N. Substantial genetic influences were found for both personality traits, confirming findings from genetic studies of personality that have relief only on self-reports of respondents.  相似文献   

9.
The social skills of 20 second- and sixth-grade students were assessed by 20 trained raters using the Social Skills Test for Children (SST-C). Rater and child characteristics were examined to determine whether differences in social skills ratings were due to the race of the rater or the race of the children being rated or due to the interactive effects of these characteristics, which would suggest racial bias in the ratings procedure. The results showed that the race of the rater did affect some behavioral observations. Black raters gave higher scores than white raters on four behavioral categories: response latency, appropriate assertion, effective assertion, and smiling. White raters gave higher scores for head position and gestures. The results of this study replicated earlier findings of significant differences in social skills ratings due to the race and age of the child being rated. The results also showed modest racial bias effects in that black and white raters scored black and white children differentially on two behavioral categories: overall skill ratings and smiling. These results suggested that most behavioral categories of the SST-C were not systematically affected by racial bias. However, the most subjective rating, overall skill, did evidence racial bias effects. This finding is consistent with previous data showing that subjective ratings may be most affected by racial bias.  相似文献   

10.
Error in performance ratings is typically believed to be due to the cognitive complexity of the rating task. Distributional assessment (DA) is proposed to improve rater accuracy by reducing cognitive load. In two laboratory studies, raters reported perceptions of cognitive effort and difficulty while assessing rating targets using DA or the traditional assessment approach. Across both studies, DA raters showed greater interrater agreement, and Study 2 findings provide some support for DA being associated with greater true score rating accuracy. However, DA raters also reported experiencing greater cognitive load during the rating task, and cognitive load did not mediate the relationship between rating format and rater accuracy. These findings have important implications regarding our understanding of cognitive load in the rating process.  相似文献   

11.
12.
Hoyt WT 《心理学方法》2007,12(4):467-475
Rater biases are of interest to behavior genetic researchers, who often use ratings data as a basis for studying heritability. Inclusion of multiple raters for each sibling pair (M. Bartels, D. I. Boomsma, J. J. Hudziak, T. C. E. M. van Beijsterveldt, & E. J. C. G. van den Oord, see record 2007-18729-006) is a promising strategy for controlling bias variance and may yield information about sources of bias in heritability studies. D. A. Kenny's (2004) PERSON model is presented as a framework for understanding determinants of rating reliability and validity. Empirical findings on rater bias in other contexts provide a starting point for addressing the impact of rater-unique perceptions in heritability studies. However, heritability studies use distinctive rating designs that may accentuate some sources of bias, such as rater communication and contrast effects, which warrant further study.  相似文献   

13.
Using parallel self-, peer, and teacher rating scales, several rating biases in children's peer ratings of depression, anxiety, and aggression were examined. Participants were 66 inpatient and 133 elementary school children (N = 199, 109 boys, 90 girls; 61% white, 39% black) aged 8 to 12, and their teachers. Results showed significant halo bias in both the children's peer ratings and the teachers' ratings. Children's self-reports on each of the three traits were significantly related to their peer ratings of the same trait, while adjusting for socioeconomic status and the peers' teachers' ratings of the same trait. Children who rated themselves as high on each trait rated their peers significantly higher on the same trait than children who rated themselves as medium or low; and for depression and anxiety, those who rated themselves as medium rated their peers significantly higher on those traits than those who rated themselves as low. For both depression and aggression, children's self-reports on the trait were significantly related to their peer ratings of the same trait, but not significantly related to their peer ratings of different traits. Disagreements between children's and teachers' ratings of the peers on all three traits were significantly related to child self-reports on each trait, indicating a possible distortion in children's peer ratings due to self-report. The implications of the results for both peer and others' assessments are discussed, and further investigation of rating biases in other informants' assessments is encouraged.These data were collected as part of the author's doctoral dissertation submitted to Memphis State University. Appreciation is expressed to Stacey Donegan for assistance with the literature review for an earlier version of this paper presented at the meeting of the Society for Research in Child Development, New Orleans, March 1993.  相似文献   

14.
The rater agreement literature is complicated by the fact that it must accommodate at east two different properties of rating data: the number of raters (two versus more than two) and the rating scale level (nominal versus metric). While kappa statistics are most widely used for nominal scales, intraclass correlation coefficients have been preferred for metric scales. In this paper, we suggest a dispersion-weighted kappa framework for multiple raters that integrates some important agreement statistics by using familiar dispersion indices as weights for expressing disagreement. These weights are applied to ratings identifying cells in the traditional inter-judge contingency table. Novel agreement statistics can be obtained by applying less familiar indices of dispersion in the same wayThis revised article was published online in August 2005 with the PDF paginated correctly.  相似文献   

15.
Rater bias is a substantial source of error in psychological research. Bias distorts observed effect sizes beyond the expected level of attenuation due to intrarater error, and the impact of bias is not accurately estimated using conventional methods of correction for attenuation. Using a model based on multivariate generalizability theory, this article illustrates how bias affects research results. The model identifies 4 types of bias that may affect findings in research using observer ratings, including the biases traditionally termed leniency and halo errors. The impact of bias depends on which of 4 classes of rating design is used, and formulas are derived for correcting observed effect sizes for attenuation (due to bias variance) and inflation (due to bias covariance) in each of these classes. The rater bias model suggests procedures for researchers seeking to minimize adverse impact of bias on study findings.  相似文献   

16.
国家公务员结构化面试中评委偏差的IRT分析   总被引:7,自引:1,他引:6  
孙晓敏  张厚粲 《心理学报》2006,38(4):614-625
使用项目反应理论(IRT)中的多面Rasch模型,对两组共12名评委在国家公务员结构化面试中的评委偏差进行了分析。提出并验证了两种评委偏差:评委之间在宽严程度上的差异和评委自身的一致性问题。结果发现:不同评委之间在宽严程度上差异显著,且不同评委评定行为的跨考生、跨维度、跨性别、跨时间的自身一致性也存在差异。研究表明,这种进入到评委个体层次的分析突破了经典测量理论(CTT)定位于评委群体进行分析的局限,针对每位评委的偏差行为提供了详细具体的诊断信息,从而为评委的针对性培训和评委库的建立提供了现代测量学的新方法  相似文献   

17.
THE CONTROL OF BIAS IN RATINGS: A THEORY OF RATING   总被引:2,自引:0,他引:2  
Based on several years of research and a careful analysis of the rating process Wherry developed a theory of rating. An accurate rating is seen as being a function of three major components: Performance of the ratee, observation of that performance by the rater, and the recall of those observations by the rater. Cast in a mold of classical psychometric theory each of these components is seen as consisting of a systematic portion and a random portion. The systematic portion of each component is further broken down. The performance of the ratee is a combination of true ability or aptitude for the job and the influence of the environment. What the rater observes is a function the performance of the ratee and bias of observation and what the rater recalls is a result of those observations combined with a bias of recall. The development of the theory of rating unfolds by defining the various factors that affect each of these components in a series of linear equations. Various theorems and corollaries are proposed which should lead to a maximization of the true ability component of the ratee and minimize environmental influence and the bias and error components. The theorems and corollaries suggest testable hypotheses for the researcher in performance evaluation.  相似文献   

18.
This study investigates the effects of rater personality (Conscientiousness and Agreeableness), rating format (graphic rating scale vs. behavioral checklist), and the rating social context (face‐to‐face feedback vs. no face‐to‐face feedback) on rating elevation of performance ratings. As predicted, raters high on Agreeableness showed more elevated ratings than those low on Agreeableness when they expected to have the face‐to‐face feedback meeting. Furthermore, rating format moderated the relationship between Agreeableness and rating elevation, such that raters high on Agreeableness provided less elevated ratings when using the behavioral checklist than the graphic rating scale, whereas raters low on Agreeableness showed little difference in elevation across different rating formats. Results also suggest that the interactive effects of rater personality, rating format, and social context may depend on the performance level of the ratee. The implications of these findings will be discussed.  相似文献   

19.
Inter-rater reliability and accuracy are measures of rater performance. Inter-rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter-rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert-generated criterion ratings and between raters using intraclass correlation (2,1). Inter-rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter-rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter-rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter-rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

20.
Rating scales have become the instrument of choice in labeling and assessing change in behavior of hyperactive children. However, several criticisms have recently have levied against their use. The present investigation examined the concurrent validity, and inter- and intrarater reliability for the Abbreviated Teacer Questionnaire (ATQ, Conners, 1973) and the Rating Scales for Hyperkinesis (Davids, 1971). Sixteen teachers from two special and two regular schools (grades 1-4) rated 211 normal and 49 special children using both scales. High correlations were found suggesting excellent predictability between scales and considerable stability across time and rater. Lower scores on a subsequent rating relative to an initial rating were demonstrated, dependent on time between ratings but independent of (a) teacher expectation of treatment gains, (b) bias produced by rating selected children, and (c) whether children were hyperactive or normal. Use of initial and infrequent rating scores versus subsequent, closely spaced ratings was related to the rater's objective (e.g., diagnosis, treatment, or assessment).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号