期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Computing interrater reliability on the Apple Macintosh computer

John O. Brooks Laura L. Brooks 《Behavior research methods》1991,23(1):82-84

A program is described for computing interrater reliability by averaging, for each rater, the correlations between one rater’s ratings and every other rater’s ratings. For situations in which raters rate more than one ratee, raters’ reliabilities can be computed for either each item or each ratee. The program reads data from a text file and puts the reliability coefficients in a text file. The standard Macintosh interface is implemented. The Quick-BASIC program is distributed both as a listing and in compiled form; it can be run with advantage with math coprocessors. 相似文献

2.

Differences in Inter‐Rater Reliability and Accuracy for a Treatment Adherence Scale

《Cognitive behaviour therapy》2013,42(4):230-239

Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales. 相似文献

3.

Optimism and Positivity Biases in Performance Appraisal Ratings: Empirical Evidence from Professional Soccer

Steffen Merkel Ho Fai Chan Sascha L. Schmidt Benno Torgler 《Psychologie appliquee》2021,70(3):1100-1127

Using unique assessment data for players from a German Bundesliga club’s youth academy, we tested four core hypotheses on how player ratings and rater or ratee-related characteristics reflect the (prospective) optimism bias and (retrospective) positivity bias. The results indicate not only that the ratings of predicted and remembered performance are indeed higher than the talents’ actual performance throughout a season, but that these differences depend positively on the rater’s organizational experience and negatively on the amount of ratee data available. They also suggest that (prospective) anticipation is even more positively biased than (retrospective) recollection of player performances, underscoring the asymmetry between looking forward and looking backward. 相似文献

4.

The impact of rater agreeableness and rating context on the evaluation of poor performance

Raymond Randall Daniel Sharples 《Journal of Occupational & Organizational Psychology》2012,85(1):42-59

We tested the effects of rater agreeableness on the rating of others’ poor performance in performance appraisal (PA). We also examined the interactions between rater agreeableness and two aspects of the rating context: ratee self‐ratings and the prospect of future collaboration with the ratee. Participants (n= 230) were allocated to one of six experimental groups (a 3 × 2 between‐groups design) or a control group (n= 20). Participants received accurate, low‐deviated, or high‐deviated self‐ratings from the ratee. Half were notified they would collaborate with the ratee in a future task. High rater agreeableness, positive deviations in self‐rating, and the prospect of future collaboration were all independent predictors of higher PA ratings. The interactions between rater agreeableness and rating context were very small. We argue that conflict avoidance is an important motivation in the PA process. 相似文献

5.

Impact and Causes of Rater Severity/Leniency in Appraisals without Postevaluation Communication Between Raters and Ratees

Chris Dewberry Anna Davies‐Muir Simon Newell 《International Journal of Selection & Assessment》2013,21(3):286-293

In performance appraisals, some assessors are substantially more lenient than others. Research on this effect in appraisals involving communication and interaction between raters and ratees after the performance evaluation has taken place indicates that it may be at least partly caused by individual differences in assessor personality. However, little is known about the impact or causes of rater severity versus leniency in situations in which there is little or no contact between raters and ratees after the performance evaluation. In Study 1 (N = 174) the strength of the severity–leniency effect in this ‘no‐contact’ context is estimated and found to be similar to that reported for ‘with‐contact’ appraisals. No evidence of an association between assessor personality and assessor severity (vs. leniency) is found in the ‘no‐contact’ context. In Study 2 (N = 54) there is no evidence of an association between the fluid cognitive ability of assessors and the severity of their ratings in a no‐contact context. It is concluded that the severity versus leniency effect probably has a considerable impact on performance ratings in ‘no‐contact’ appraisal settings, but that neither rater personality nor rater cognitive ability appear to play a significant role in this. 相似文献

6.

Rater bias in the EASI temperament scales: a twin study 总被引：1，自引：0，他引：1

M C Neale J Stevenson 《Journal of personality and social psychology》1989,56(3):446-455

Under trait theory, ratings may be modeled as a function of the temperament of the child and the bias of the rater. Two linear structural equation models are described, one for mutual self- and partner ratings, and one for multiple ratings of related individuals. Application of the first model to EASI temperament data collected from spouses rating each other shows moderate agreement between raters and little rating bias. Spouse pairs agree moderately when rating their twin children, but there is significantly rater bias, with greater bias for monozygotic than for dizygotic twins. MLE's of heritability are approximately .5 for all temperament scales with no common environmental variance. Results are discussed with reference to trait validity, the person-situation debate, halo effects, and stereotyping. Questionnaire development using ratings on family members permits increased rater agreement and reduced rater bias. 相似文献

7.

EXAMINING RATING SOURCE VARIATION IN WORK BEHAVIOR TO KSA LINKAGES

LAURA E. BARANOWSKI LANCE E. ANDERSON 《Personnel Psychology》2005,58(4):1041-1054

We examined Work Behavior to knowledge, skill, or ability linkage ratings for 9 jobs to determine the degree to which differences in the ratings were due to rater type. We collected ratings from incumbents and 2 types of job analysts: project job analysts (analysts knowledgeable of the job) and nonproject job analysts (analysts with very little or no knowledge of the job). In our analyses of the data, we calculated means, standard deviations, effect sizes, and correlations for each rater type, as well as compared the reliability of the ratings. We also estimated variance components for each job by conducting generalizability analyses ( Brennan, 1983 ; Shavelson, Webb, & Rowley, 1989 ). Our findings indicate that the level of linkage ratings is similar across rater types, that it is important to obtain ratings from multiple raters regardless of rater type, and that ratings from job analysts may be more reliable than those of incumbents. 相似文献

8.

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Stefanie A. Wind Edward W. Wolfe George Engelhard Jr. Peter Foltz Mark Rosenstein 《International Journal of Testing》2018,18(1):27-49

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be “trained” using machine-learning techniques that incorporate human ratings. However, the quality of the human ratings used to train the AESEs is rarely examined. As a result, the impact of various rater effects (e.g., severity and centrality) on the quality of AESE-assigned scores is not known. In this study, we use data from a large-scale rater-mediated writing assessment to examine the impact of rater effects on the quality of AESE-assigned scores. Overall, the results suggest that if rater effects are present in the ratings used to train an AESE, the AESE scores may replicate these effects. Implications are discussed in terms of research and practice related to automated scoring. 相似文献

9.

A Hierarchical Rater Model for Longitudinal Data

Jodi M. Casabianca Brian W. Junker Ricardo Nieto Mark A. Bond 《Multivariate behavioral research》2017,52(5):576-592

Research studies in psychology and education often seek to detect changes or growth in an outcome over a duration of time. This research provides a solution to those interested in estimating latent traits from psychological measures that rely on human raters. Rater effects potentially degrade the quality of scores in constructed response and performance assessments. We develop an extension of the hierarchical rater model (HRM), which yields estimates of latent traits that have been corrected for individual rater bias and variability, for ratings that come from longitudinal designs. The parameterization, called the longitudinal HRM (L-HRM), includes an autoregressive time series process to permit serial dependence between latent traits at adjacent timepoints, as well as a parameter for overall growth. We evaluate and demonstrate the feasibility and performance of the L-HRM using simulation studies. Parameter recovery results reveal predictable amounts and patterns of bias and error for most parameters across conditions. An application to ratings from a study of character strength demonstrates the model. We discuss limitations and future research directions to improve the L-HRM. 相似文献

10.

FACTORIAL INVARIANCE OF PERSONALITY RATINGS

Jorma Kuusinen 《Scandinavian journal of psychology》1969,10(1):33-44

K uusinen , J. Factorial invariance of personality ratings. Scand. J. Psychol ., 1969, 10 , 33–44—Three studies employing the same set of 33 personality rating scales are described. (I) Five factor structures of the scales, derived from ratings of five groups of stimuli, were compared. (2) One subject rated different personality concepts three times at one week's intervals. (3) 12 individual factor structures were compared to the factor structure computed from peer ratings of a group of 39 subjects. Results: (a) the factor structure of the scales was largely independent of stimuli, (b) the structure for an individual was stable, (c) the group structure represented the individual's structure, ( d ) differences between group and the individual structure and between individuals were negligible. 相似文献

11.

A Meta-Analysis of the Relationship Between Rater Liking and Performance Ratings

Ashley W. Sutton Sean P. Baldwin Lauren Wood 《人类行为》2013,26(5):409-429

This meta-analysis reviewed the magnitude and moderators of the relationship between rater liking and performance ratings. The results revealed substantial overlap between rater liking and performance ratings (ρ = .77). Although this relationship is often interpreted as indicative of bias, we review studies that indicate that to some extent the relationship between liking and performance ratings potentially reflects “true” differences in ratee performance. Moderator analyses indicated that the relationship between liking and performance ratings was weaker for ratings of organizational citizenship behaviors, ratings made by peer raters, ratings in nonsales jobs, and ratings made for development; however, the relationship was strong across moderator levels, underscoring the robustness of this relationship. Implications for the interpretation of performance ratings are discussed. 相似文献

12.

Accountability and need for cognition effects on contrast, halo, and accuracy in performance ratings

Palmer JK Feldman JM 《The Journal of psychology》2005,139(2):119-137

In the present study, the authors investigated the effects of accountability and need for cognition on contrast errors, halo, and accuracy of performance ratings examined in good and poor performance context conditions, as well as in a context-free control condition. The accountability manipulation reduced the contrast effect and also modified rater recall of good ratee behavior. Accountability reduced halo in ratings and increased rating accuracy in a poor performance context. Accountability also interacted with need for cognition in predicting individual rater halo. 相似文献

13.

INTERRATER CORRELATIONS DO NOT ESTIMATE THE RELIABILITY OF JOB PERFORMANCE RATINGS 总被引：5，自引：1，他引：4

KEVIN R. MURPHY RICHARD DESHON 《Personnel Psychology》2000,53(4):873-900

Interrater correlations are widely interpreted as estimates of the reliability of supervisory performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theory to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations. 相似文献

14.

RATER-RATEE RACE EFFECTS IN DEVELOPMENTAL PERFORMANCE RATINGS OF MANAGERS

MICHAEL K. MOUNT MARCIA R. SYTSMA JOY FISHER HAZUCHA KATHERINE E. HOLT 《Personnel Psychology》1997,50(1):51-69

The effects of rater and ratee race on performance ratings of managers were examined. Ratings were obtained from peers, subordinates and bosses as part of a multirater, developmental feedback program for managers. Two data sets were created for purposes of this study. The between-subjects data set consisted of ratings from over 20,000 bosses, over 50,000 peers, and over 40,000 subordinates. The repeated measures data set was substantially smaller because it included only those Black and White managers who were rated by both a Black and White rater from each of the three perspectives. Results for rater race indicated that Black raters from all perspectives (peers, subordinates, and bosses) assigned more favorable ratings to ratees of their own race. Results for White raters differed according to the particular rating source. White bosses assigned more favorable ratings to ratees of their own race, but White subordinates did not. White peers assigned more favorable ratings to Whites in the repeated measures analysis, but not in the between-subjects analysis. Results for ratee race indicated that both White and Black managers received higher ratings from Black raters than from White raters, and the effect was more pronounced for ratings assigned to Black managers. 相似文献

15.

Evidence for genetic influences on personality from self-reports and informant ratings.

A C Heath M C Neale R C Kessler L J Eaves K S Kendler 《Journal of personality and social psychology》1992,63(1):85-96

Self-report data on Extraversion (E) and Neuroticism (N), together with ratings by the co-twin, were obtained from a sample of 826 adult female twin pairs ascertained through a population-based twin register. Data were analyzed using a model that allowed for the contributions to personality ratings of the rater's personality (rater bias) as well as of the personality of the person being rated. For E, but not for N, significant rater bias was found, with extraverted respondents tending to underestimate, and introverted respondents tending to overestimate, the Extraversion of their co-twins. Good agreement between self-reports and ratings by the respondent's co-twin was found for both E and N. Substantial genetic influences were found for both personality traits, confirming findings from genetic studies of personality that have relief only on self-reports of respondents. 相似文献

16.

Measurement of subjective responses to alcohol and non-alcohol slides by alcoholic respondents

R M Costello D P Rice L Schoenfeld 《Behaviour research and therapy》1974,12(1):35-40

Although ‘anxiety’ has traditionally been hypothesized as the cognitive mediating CR which results from electrical a versive conditioning, alternate competing hypotheses have not been adequately studied. This investigation has generated four constructs from a factor analysis of adjective ratings on alcohol-related stimuli and four constructs from an identical analysis of ratings on non-alcohol related stimuli. As these constructs are orthogonal, each can be used as an independent measure of ‘cognitive mediator’ or alternative competing hypothesis. The alcohol-related constructs appeared to be: (1) dangerous vs. safety; (2) approach or appetitiveness: (3) avoidance or aversiveness; and (4) general evaluation good vs. bad. 相似文献

17.

RATER SOURCE EFFECTS ARE ALIVE AND WELL AFTER ALL

BRIAN HOFFMAN CHARLES E. LANCE BETHANY BYNUM WILLIAM A. GENTRY 《Personnel Psychology》2010,63(1):119-151

Recent research has questioned the importance of rater perspective effects on multisource performance ratings (MSPRs). Although making a valuable contribution, we hypothesize that this research has obscured evidence for systematic rater source effects as a result of misspecified models of the structure of multisource performance ratings and inappropriate analytic methods. Accordingly, this study provides a reexamination of the impact of rater source on multisource performance ratings by presenting a set of confirmatory factor analyses of two large samples of multisource performance rating data in which source effects are modeled in the form of second-order factors. Hierarchical confirmatory factor analysis of both samples revealed that the structure of multisource performance ratings can be characterized by general performance, dimensional performance, idiosyncratic rater, and source factors, and that source factors explain (much) more variance in multisource performance ratings whereas general performance explains (much) less variance than was previously believed. These results reinforce the value of collecting performance data from raters occupying different organizational levels and have important implications for research and practice. 相似文献

18.

Validation of affective and neutral sentence content for prosodic testing

Jeff B. Russ Ruben C. Gur Warren B. Bilker 《Behavior research methods》2008,40(4):935-939

Conducting a study of emotional prosody often requires that one have a valid set of stimuli for assessing perceived emotion in vocal intonation. In this study, we created a list of sentences with both affective and neutral content, and then validated them against rater opinion. Participants read sentences with content that implied happiness, sadness, anger, fear, or neutrality and rated how well they could imagine each sentence being expressed in each emotion. Coefficients of variation and intraclass correlations were calculated to narrow the list to affective sentences that had high agreement and neutral sentences that had low agreement. We found that raters could easily identify most emotional content and did not ascribe any unique emotion to most neutral content. We also found differences between the intensity of male and female ratings. The final list of sentences is available on the Internet (www.med.upenn.edu/bbl/) and can be recorded for use as stimuli for prosodic studies. 相似文献

19.

Race of ratee and anonymity of rater: a study comparing students with practitioners as performance appraisers

Beaulieu RP 《Psychological reports》2005,97(2):389-399

123 students and 123 nonstudent supervisors viewed videotapes which displayed four supposed subordinate supervisors, two African Americans and two Caucasians, who individually described their respective performances during the past year. After being told either that the supposed subordinates would or that they would not have access to the performance rating, the subjects rated the performance of those subordinate supervisors. While anonymity of rater and race of rater had no evaluative effect on the performance ratings given by the nonstudent subjects, the student subjects gave higher ratings when they believed that their ratings would be made public. Also, the nonstudent subjects' ratings differed as a function of whether they worked closely with others of another race and as a function of the frequency with which they actually discussed performance evaluations with their own subordinates. 相似文献

20.

Competencies,Personality Traits,and Organizational Rewards of Middle Managers: A Motive-Based Approach

Laura Guillén Willem E. Saris 《人类行为》2013,26(1):66-92

Previous literature suggests that performance ratings are saturated with rater-related idiosyncratic variance. Given that modern psychometric theories relegate this source of variance to measurement error, it has not been the subject of much previous research. Of importance, identifying and estimating the variance components underlying idiosyncratic rater variance will inform our understanding of the nature of this variance. In a sample of managerial performance ratings we report on components of variance and find that the idiosyncratic rater variance component is about one third rater main effects variance, one third Rater × Ratee interaction effects variance, and one third upper-bound Rater × Ratee × Dimension interaction effects variance. Further, results indicate that variance components are moderated by the acquaintanceship time between the rater and the ratee. 相似文献