首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The possibility for age discrimination and stereotypes to affect performance evaluations is rising. Although careful evaluations might be expected from conscientious raters, little is known about whether they might show more or less bias towards certain age groups. Therefore, in our study using a time-lagged design, we investigated the effects of rater conscientiousness on the performance evaluations of younger and older actual co-worker (= 242). We found that raters who were more conscientious provided higher ratings for older workers than for younger workers on task performance and organizational citizenship behaviours. Specifically, we tested the model of mediated moderation, in which the relation between rater conscientiousness and ratee age predicts ratee-perceived conscientiousness, which in turn predicts performance ratings. The model was significant for older ratees, but not for younger ratees. We discuss our results in terms of the “similar to me” effects and implications for organizational practices.  相似文献   

2.
In performance appraisals, some assessors are substantially more lenient than others. Research on this effect in appraisals involving communication and interaction between raters and ratees after the performance evaluation has taken place indicates that it may be at least partly caused by individual differences in assessor personality. However, little is known about the impact or causes of rater severity versus leniency in situations in which there is little or no contact between raters and ratees after the performance evaluation. In Study 1 (N = 174) the strength of the severity–leniency effect in this ‘no‐contact’ context is estimated and found to be similar to that reported for ‘with‐contact’ appraisals. No evidence of an association between assessor personality and assessor severity (vs. leniency) is found in the ‘no‐contact’ context. In Study 2 (N = 54) there is no evidence of an association between the fluid cognitive ability of assessors and the severity of their ratings in a no‐contact context. It is concluded that the severity versus leniency effect probably has a considerable impact on performance ratings in ‘no‐contact’ appraisal settings, but that neither rater personality nor rater cognitive ability appear to play a significant role in this.  相似文献   

3.
We tested the effects of rater agreeableness on the rating of others’ poor performance in performance appraisal (PA). We also examined the interactions between rater agreeableness and two aspects of the rating context: ratee self‐ratings and the prospect of future collaboration with the ratee. Participants (n= 230) were allocated to one of six experimental groups (a 3 × 2 between‐groups design) or a control group (n= 20). Participants received accurate, low‐deviated, or high‐deviated self‐ratings from the ratee. Half were notified they would collaborate with the ratee in a future task. High rater agreeableness, positive deviations in self‐rating, and the prospect of future collaboration were all independent predictors of higher PA ratings. The interactions between rater agreeableness and rating context were very small. We argue that conflict avoidance is an important motivation in the PA process.  相似文献   

4.
Research exploring the effects of physical attractiveness frequently assesses attractiveness by employing subjective appraisals by independent raters. However, there is reason to believe that rater characteristics – especially their sex – may systematically bias subjective ratings of physical attractiveness. The current study explores this possibility by analyzing data from the National Longitudinal Study of Adolescent Health (N = 13,330). Analyses of these data revealed that ratings of physical attractiveness are significantly influenced by the sex of the interviewer/rater. Specifically, male raters were significantly less likely than female raters to assess males as attractive, very attractive, and very unattractive. The results are explained within the context of evolutionary psychology and illustrate a methodological concern for research on physical attractiveness.  相似文献   

5.
The social skills of 20 second- and sixth-grade students were assessed by 20 trained raters using the Social Skills Test for Children (SST-C). Rater and child characteristics were examined to determine whether differences in social skills ratings were due to the race of the rater or the race of the children being rated or due to the interactive effects of these characteristics, which would suggest racial bias in the ratings procedure. The results showed that the race of the rater did affect some behavioral observations. Black raters gave higher scores than white raters on four behavioral categories: response latency, appropriate assertion, effective assertion, and smiling. White raters gave higher scores for head position and gestures. The results of this study replicated earlier findings of significant differences in social skills ratings due to the race and age of the child being rated. The results also showed modest racial bias effects in that black and white raters scored black and white children differentially on two behavioral categories: overall skill ratings and smiling. These results suggested that most behavioral categories of the SST-C were not systematically affected by racial bias. However, the most subjective rating, overall skill, did evidence racial bias effects. This finding is consistent with previous data showing that subjective ratings may be most affected by racial bias.  相似文献   

6.
7.

Attention-deficit/hyperactivity disorder (ADHD) is a childhood-onset condition that may continue into adulthood. When assessing adult patients, clinicians usually rely on retrospective reports of childhood symptoms to evaluate the age-of-onset criterion. Since inaccurate symptom recall may impede the diagnosis and treatment of ADHD, knowledge about the factors influencing retrospective reports is needed. This longitudinal study investigated (a) the accuracy of retrospective symptom ratings by adult participants with a childhood diagnosis of ADHD (self-ratings) and parents or significant others (proxy ratings), and (b) the influence of current ADHD symptom severity and ADHD-associated impairments on retrospective symptom ratings. Participants (N =?55) were members of the Cologne Adaptive Multimodal Treatment (CAMT) study who had been referred and treated for ADHD in childhood and were reassessed in adulthood (average age 27 years). Participants’ retrospective self-ratings were substantially lower than, and did not correlate with, parents’ ADHD symptom ratings provided at study entry, while retrospective symptom ratings provided by proxy respondents correlated moderately with parents’ childhood ratings. In addition, participants were more likely to underreport childhood symptoms (79%) and more frequently denied the presence of three or more childhood symptoms (17%) compared to proxy respondents (65% underreporting, 10% false-negative recall). Proxy respondents’ symptom recall was best predicted by childhood ADHD, while participants’ symptom recall was best predicted by current ADHD symptom severity. ADHD-associated impairments were not correlated with symptom recall after controlling for childhood ADHD. Together, these findings suggest a recall bias in adult patients and question the validity of retrospective reports, even in clinical samples.

  相似文献   

8.
Obtaining data from multiple informants provides a more comprehensive diagnostic picture in the assessment of attention deficit hyperactivity disorder (ADHD). Differences in symptom ratings have been observed between parent- and teacher-report scales, though less information is available regarding differences between mothers and fathers. To address this gap, this study examines the rater agreement between mothers and fathers on the Diagnostic and Statistical Manual of Mental Disorders – Fourth Edition (DSM-IV) ADHD Symptom Rating Scale (DSM-ADHD-SRS). The participants consisted of 337 children diagnosed with ADHD who underwent comprehensive neuropsychological assessment. Confirmatory factor analysis indicates that a three-factor model comprising inattention, hyperactivity, and impulsivity symptoms provides the best fit for both mothers’ and fathers’ ratings. Mothers provided higher mean ratings for the inattention scale. These results suggest that the factor structure for the DSM-ADHD-SRS is the same, regardless of parent gender. However, symptoms of inattention may vary depending upon which parent completes the ratings. This discrepancy could lead to differences in diagnostic impressions in clinical evaluations.  相似文献   

9.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

10.
Research exploring the effects of physical attractiveness frequently assesses attractiveness by employing subjective appraisals by independent raters. However, there is reason to believe that rater characteristics – especially their sex – may systematically bias subjective ratings of physical attractiveness. The current study explores this possibility by analyzing data from the National Longitudinal Study of Adolescent Health (N = 13,330). Analyses of these data revealed that ratings of physical attractiveness are significantly influenced by the sex of the interviewer/rater. Specifically, male raters were significantly less likely than female raters to assess males as attractive, very attractive, and very unattractive. The results are explained within the context of evolutionary psychology and illustrate a methodological concern for research on physical attractiveness.  相似文献   

11.
Rater bias in the EASI temperament scales: a twin study   总被引:1,自引:0,他引:1  
Under trait theory, ratings may be modeled as a function of the temperament of the child and the bias of the rater. Two linear structural equation models are described, one for mutual self- and partner ratings, and one for multiple ratings of related individuals. Application of the first model to EASI temperament data collected from spouses rating each other shows moderate agreement between raters and little rating bias. Spouse pairs agree moderately when rating their twin children, but there is significantly rater bias, with greater bias for monozygotic than for dizygotic twins. MLE's of heritability are approximately .5 for all temperament scales with no common environmental variance. Results are discussed with reference to trait validity, the person-situation debate, halo effects, and stereotyping. Questionnaire development using ratings on family members permits increased rater agreement and reduced rater bias.  相似文献   

12.

Purpose

The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.

Design/Methodology/Approach

Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS® MSPR instrument.

Findings

Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.

Implications

The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.

Originality/Value

The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.  相似文献   

13.
Recent research has questioned the importance of rater perspective effects on multisource performance ratings (MSPRs). Although making a valuable contribution, we hypothesize that this research has obscured evidence for systematic rater source effects as a result of misspecified models of the structure of multisource performance ratings and inappropriate analytic methods. Accordingly, this study provides a reexamination of the impact of rater source on multisource performance ratings by presenting a set of confirmatory factor analyses of two large samples of multisource performance rating data in which source effects are modeled in the form of second-order factors. Hierarchical confirmatory factor analysis of both samples revealed that the structure of multisource performance ratings can be characterized by general performance, dimensional performance, idiosyncratic rater, and source factors, and that source factors explain (much) more variance in multisource performance ratings whereas general performance explains (much) less variance than was previously believed. These results reinforce the value of collecting performance data from raters occupying different organizational levels and have important implications for research and practice.  相似文献   

14.
A program is described for computing interrater reliability by averaging, for each rater, the correlations between one rater’s ratings and every other rater’s ratings. For situations in which raters rate more than one ratee, raters’ reliabilities can be computed for either each item or each ratee. The program reads data from a text file and puts the reliability coefficients in a text file. The standard Macintosh interface is implemented. The Quick-BASIC program is distributed both as a listing and in compiled form; it can be run with advantage with math coprocessors.  相似文献   

15.
This meta-analysis reviewed the magnitude and moderators of the relationship between rater liking and performance ratings. The results revealed substantial overlap between rater liking and performance ratings (ρ = .77). Although this relationship is often interpreted as indicative of bias, we review studies that indicate that to some extent the relationship between liking and performance ratings potentially reflects “true” differences in ratee performance. Moderator analyses indicated that the relationship between liking and performance ratings was weaker for ratings of organizational citizenship behaviors, ratings made by peer raters, ratings in nonsales jobs, and ratings made for development; however, the relationship was strong across moderator levels, underscoring the robustness of this relationship. Implications for the interpretation of performance ratings are discussed.  相似文献   

16.

The aim of this paper is to empirically assess the reliability of the plan formulation method for couples, a procedure for formulating the case, planning, and monitoring the couple therapies according to control-mastery theory. We hypothesized that when couples are looking for couple therapy, they have an unconscious couple’s plan for the therapy, which includes the couple’s goals; the pathogenic beliefs that the partners want to disprove; the traumas from which these beliefs originated and that the partners want to master; the vicious relational circles that make the couple suffer and that the couple wants to break; the virtuous relational circles that are expressions of the couple’s resources and that the couple wants to fuel; and the relational insights that may help the couple get better. Our study involved 15 couples treated by four experienced therapists. Four raters independently formulated each couple’s plan based on the first three sessions following a standard procedure, and we calculated the intraclass correlation for pooled judges’ ratings. For a subsample of three couples—who before and after treatment had completed the dyadic adjustment scale (DAS) and the outcome questionnaire-45.2 (OQ-45.2)—the compatibility of the therapists’ interventions with the couples’ and partners’ plans was assessed. The relationship between the ratings of compatibility, DAS and OQ-45.2, was assessed. The results showed excellent interjudge reliability for each couple’s plan formulation (average ICC?=?0.82), attesting to the validity of the procedure; and preliminary data on the therapeutic process suggested that therapists’ interventions compatible with couple’s plans could help partners achieve good outcomes.

  相似文献   

17.
The main purposes of this study are to examine whether multisource feedback ratings predict leaders' organizational goal performance, and whether the relationships are consistent across the two rating purposes (developmental, administrative), two leadership dimensions behaviors (Consideration, Initiating Structure), and three rating perspectives (supervisor, self, and ‘other’ raters, i.e., peers and subordinates). Leaders (n=396) in a large organization in the transportation industry participated in two multisource feedback programs, the first for developmental purposes and the second 8 months later for administrative purposes. Approximately 1 month later, they were rated by their supervisor on their effectiveness in attaining five organizational performance goals (financial, safety, customer satisfaction, employee satisfaction, diversity). Results revealed that both developmental ratings and administrative ratings uniquely predicted leaders' goal performance. However, both leadership dimension and rater perspective moderated these relationships. Leadership behaviors associated with Consideration were stronger predictors of goal performance for supervisor ratings, whereas behaviors associated with Initiating Structure were stronger predictors of goal performance for self and other ratings.  相似文献   

18.
This study extends multisource feedback research by assessing the effects of rater source and raters' cultural value orientations on rating bias (leniency and halo). Using a motivational perspective of performance appraisal, the authors posit that subordinate raters followed by peers will exhibit more rating bias than superiors. More important, given that multisource feedback systems were premised on low power distance and individualistic cultural assumptions, the authors expect raters' power distance and individualism-collectivism orientations to moderate the effects of rater source on rating bias. Hierarchical linear modeling on data collected from 1,447 superiors, peers, and subordinates who provided developmental feedback to 172 military officers show that (a) subordinates exhibit the most rating leniency, followed by peers and superiors; (b) subordinates demonstrate more halo than superiors and peers, whereas superiors and peers do not differ; (c) the effects of power distance on leniency and halo are strongest for subordinates than for peers and superiors; (d) the effects of collectivism on leniency were stronger for subordinates and peers than for superiors; effects on halo were stronger for subordinates than superiors, but these effects did not differ for subordinates and peers. The present findings highlight the role of raters' cultural values in multisource feedback ratings.  相似文献   

19.
Retrospective rating scales are widely used for formal assessment of typical performance. Raters who are the most familiar/interactive with ratees are routinely recommended to maximize the quality of ratings. This caveat to use the most familiar/interactive raters fails to distinguish sampling parameters of the observations on which ratings are based that may be important to assessing different classes of behavior. We hypothesized that systematic observational schedules would be of greater importance to ratings of public events than familiarity/interaction, per se, while the caveat would hold for ratings of private events. We used the Psychotic Inpatient Profile (PIP), which provides separate factor scores for ratings of public and private events, to examine these hypotheses in a quasi-experimental study with adult inpatients of mental hospitals. A large multiinstitutional data set provided retrospective PIP ratings by two types of raters. The most familiar/interactive local clinical staff for each client completed the PIP after observing on an ad lib schedule, along with ongoing job duties. Unfamiliar, noninteractive raters completed the PIP for each client after observing on a systematic time-sampling schedule for purposes of coding an entirely different instrument. Data were selected so that each of 189 clients received PIP scores from four raters, reflecting functioning during the same time period based on day-shift observations by one rater of each type and evening-shift observations by one rater of each type. Analyses of variance, consistency/discriminability of ratings, and prediction of social-action outcomes all supported the hypotheses. We discuss alternative strategies that are better for assessing typical performance in most circumstances. We also provide recommendations for improving the adequacy of observations for those circumstances in which the standardized retrospective rating scale could be a cost-effective assessment strategy.This study was the basis of a master's thesis at the University of Houston by the senior author under the direction of the junior authors. Richard M. Rozelle served on the examination committee. This study was partially supported by grants to Gordon L. Paul from the National Institute of Mental Health, Public Health Service (MH-15353; MH-25464); the Illinois Department of Mental Health and Developmental Disabilities; the Joyce Foundation; the MacArthur Foundation; the Owsley Foundation; the Cullen Foundation; and the Center for Public Policy of the University of Houston.  相似文献   

20.
The retrospective evaluation of an event tends to be based on how the experience felt during the most intense moment and the last moment. Two experiments tested whether this so-called peak-end effect influences how primary school students are affected by peer assessments. In both experiments, children (ages 7–12) assessed two classmates on their behaviour in school and then received two manipulated assessments. In Experiment 1 (N ?=?30), one assessment consisted of four negative ratings and the other of four negative ratings with an extra moderately negative rating added to the end. In Experiment 2 (N ?=?44), one assessment consisted of four positive ratings, and the other added an extra moderately positive rating to the end. Consistent with the peak-end effect, the extended assessment in Experiment 1 and the short assessment in Experiment 2 were remembered as more pleasant and less difficult to deal with, which shaped children’s peer assessment preferences and prospective choices of which assessment to repeat. These findings indicate that the process of peer assessment can be improved by ending the feedback with the most positive part of the assessment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号