首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study examined the impact of various components of rater training on the accuracy of rating behavior using Direct Behavior Rating-Single Item Scales (DBR-SIS). Specifically, the addition of frame-of-reference and rater error training components to a standard package involving an overview and then modeling, practice, and feedback was investigated. In addition, amount of exposure to the direct training component (i.e., number of practice and feedback opportunities) was evaluated, and the rates at which behavior was displayed were carefully manipulated to control for and evaluate training impact by target and rate of behavior. The sample consisted of undergraduate students assigned to one of 6 possible conditions. Overall findings suggested that completion of a training package did result in enhanced accuracy when using DBR-SIS to rate academic engagement and disruption. However, results also supported that the most comprehensive package of DBR training may not always result in greater improvements over a standard package involving direct training. In general, a more intensive training package appeared beneficial at improving ratings for targets that had previously been difficult to rate accurately (e.g., medium rate disruptive behavior). Limitations and implications for future research are discussed.  相似文献   

2.
ABSTRACT This study considered the validity of the personality structure based on the Five‐Factor Model using both self‐ and peer reports on twins' NEO‐PI‐R facets. Separating common from specific genetic variance in self‐ and peer reports, this study examined genetic substance of different trait levels and rater‐specific perspectives relating to personality judgments. Data of 919 twin pairs were analyzed using a multiple‐rater twin model to disentangle genetic and environmental effects on domain‐level trait, facet‐specific trait, and rater‐specific variance. About two thirds of both the domain‐level trait variance and the facet‐specific trait variance was attributable to genetic factors. This suggests that the more personality is measured accurately, the better these measures reflect the genetic structure. Specific variance in self‐ and peer reports also showed modest to substantial genetic influence. This may indicate not only genetically influenced self‐rater biases but also substance components specific for self‐ and peer raters' perspectives on traits actually measured.  相似文献   

3.
4.
Much research shows that judgmental estimation could be improved by combining estimates from independent judges as well as within judges. These results have been obtained mostly with judgments about matters of fact, that is, for which there are objective truth criteria. In the present research, we extend these findings to performance evaluations. In a controlled field study, expert judges provided evaluations of a large number of essays written by college applicants taking college entrance tests. The judges were asked to evaluate each essay twice—on two occasions, a week apart. This design allowed us to assess the benefits of two methods of combining evaluations: within rater and across raters. Accuracy gains were obtained with both methods. Although the within‐rater combinations yielded fewer gains than the across‐rater ones, they were still appreciable in comparison with the across rater ones. Our findings extend the class of judgments to which the “wisdom of many” could be applied. These findings are potentially applicable to performance evaluations in social, educational, and employment settings.  相似文献   

5.
Raters who pursue different goals give different ratings   总被引:5,自引:0,他引:5  
J. N. Cleveland and K. R. Murphy (1992) suggested that phenomena such as rater errors and interrater disagreements could be understood in terms of differences in the goals pursued by various raters. We measured 19 rating goals of students at the beginning of a semester, grouped them into scales, and correlated these with teacher evaluations collected at the end of the semester. We found significant multiple correlations, both within classes and in an analysis of the pooled sample (adjusting for instructor mean differences, incremental R2 =.08). Measures of rating goals obtained after raters had observed a significant proportion of ratee performance accounted for variance (incremental R2 =.07) not accounted for by measures of goals obtained at the beginning of the semester.  相似文献   

6.
Interrater reliability of eight teacher rating scales designed to assess characteristics of attention-deficit hyperactivity disorder was investigated. Coteachers of 46 students completed the rating scales. The students, ages 8–17, were designated as having a Serious Emotional Disturbance. The resulting interrater reliability correlation coefficients ranged from .62 to .87. The percentage of variance shared between raters ranged from a low of 38.4% (the ACTeRS Oppositional factor and the CBCL-TRF Attention Problems factor) to 75.7% (ADHD Rating Scale). The percent of shared variance was higher for younger children. Kappa scores evaluating rater agreement were highest at the two standard deviations above the mean cutoff. The reliability coefficients were consistent with those reported in prior research.  相似文献   

7.
Although curriculum based measures of oral reading (CBM-R) have strong technical adequacy, there is still a reason to believe that student performance may be influenced by factors of the testing situation, such as errors examiners make in administering and scoring the test. This study examined the construct-irrelevant variance introduced by examiners using a cross-classified multilevel model. We sought to determine the extent of variance in student CBM-R scores attributable to examiners and, if present, the extent to which it was moderated by students' grade level and English learner (EL) status. Fit indices indicated that a cross-classified random effects model (CCREM) best fits the data with measures nested within students, students nested within schools, and examiners crossing schools. Intraclass correlations of the CCREM revealed that roughly 16% of the variance in student CBM-R scores was associated between examiners. The remaining variance was associated with the measurement level, 3.59%; between students, 75.23%; and between schools, 5.21%. Results were moderated by grade level but not by EL status. The discussion addresses the implications of this error for low-stakes and high-stakes decisions about students, teacher evaluation systems, and hypothesis testing in reading intervention research.  相似文献   

8.
A three-level piecewise growth model (3L-PGM) can be used to break up nonlinear growth into multiple components, providing the opportunity to examine potential sources of variation in individual and contextual growth within different segments of the model. The conventional 3L-PGM assumes that the data are strictly hierarchical in nature, where measurement occasions (level 1) are nested within individuals (level 2) who are members of a single cluster (level 3). However, in longitudinal research, it is sometimes difficult for data structures to remain purely clustered during a study, such as when some students change classrooms or schools over time. One resulting data structure in this situation is known as a multiple membership structure, where some lower-level units are members of more than one higher-level unit. The new multiple membership PGM (MM-PGM) extends the 3L-PGM to handle multiple membership data structures frequently found in the social sciences. This study sought to examine the consequences of ignoring individual mobility across clusters when estimating a 3L-PGM in comparison to estimating a MM-PGM. MM-PGM estimates were less biased (especially in the cluster-level coefficient estimates), although we found substantial bias in cluster-level variance components across some conditions for both models.  相似文献   

9.
Interrater correlations are widely interpreted as estimates of the reliability of supervisory performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theory to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations.  相似文献   

10.
Social disorganization theory suggests that certain school-level indictors of disorder may be important predictors of bullying-related attitudes and behaviors. Multilevel analyses were conducted on bullying-related attitudes and experiences among 22,178 students in 95 elementary and middle schools. The intraclass correlation coefficients indicated that 0.6–2% of the variance in victimization, 5–10% of the variance in retaliatory attitudes, 5–6% of the variance in perceptions of safety, and 0.9% of the variance in perpetration of bullying was associated with the clustering of students within schools. Although the specific associations varied somewhat for elementary schools as compared to middle schools, the hierarchical linear modeling analyses generally suggested that school-level indicators of disorder (e.g., student–teacher ratio, concentration of student poverty, suspension rate, and student mobility) were significant predictors of bullying-related attitudes and experiences. Student-level characteristics (i.e., sex, ethnicity, status in school) were also relevant to students’ retaliatory attitudes, perceptions of safety, and involvement in bullying. Implications for school-based research and violence prevention are provided.  相似文献   

11.
国内外考试改革和大型测评实践越来越强调主观题的作用,则评分者信度研究又重新成为一个备受关注的议题。研究在Wang和Liu(2007)的广义多水平侧面模型基础上,提出并探讨了等级反应多水平侧面模型。结果表明:在评分者固定效应和随机效应两种实验条件下,各偏差值的均值与标准差均较小,说明模型在当前实验条件下,各参数估计值的返真性和稳健性均较好,可以检测出评分者效应,由此,后续可进一步加入评分者效应的影响因素,使其发展为可同时检测评分者效应及其影响因素的完整模型。  相似文献   

12.
TRAIT, RATER AND LEVEL EFFECTS IN 360-DEGREE PERFORMANCE RATINGS   总被引:2,自引:0,他引:2  
Method and trait effects in multitrait-multirater (MTMR) data were examined in a sample of 2,350 managers who participated in a developmental feedback program. Managers rated their own performance and were also rated by two subordinates, two peers, and two bosses. The primary purpose of the study was to determine whether method effects are associated with the level of the rater (boss, peer, subordinate, self) or with each individual rater, or both. Previous research which has tacitly assumed that method effects are associated with the level of the rater has included only one rater from each level; consequently, method effects due to the rater's level may have been confounded with those due to the individual rater. Based on confirmatory factor analysis, the present results revealed that of the five models tested, the best fit was the 10-factor model which hypothesized 7 method factors (one for each individual rater) and 3 trait factors. These results suggest that method variance in MTMR data is more strongly associated with individual raters than with the rater's level. Implications for research and practice pertaining to multirater feedback programs are discussed.  相似文献   

13.
In this study, we focus on a three-level meta-analysis for combining data from studies using multiple-baseline across-participants designs. A complicating factor in such designs is that results might be biased if the dependent variable is affected by not explicitly modeled external events, such as the illness of a teacher, an exciting class activity, or the presence of a foreign observer. In multiple-baseline designs, external effects can become apparent if they simultaneously have an effect on the outcome score(s) of the participants within a study. This study presents a method for adjusting the three-level model to external events and evaluates the appropriateness of the modified model. Therefore, we use a simulation study, and we illustrate the new approach with real data sets. The results indicate that ignoring an external event effect results in biased estimates of the treatment effects, especially when there is only a small number of studies and measurement occasions involved. The mean squared error, as well as the standard error and coverage proportion of the effect estimates, is improved with the modified model. Moreover, the adjusted model results in less biased variance estimates. If there is no external event effect, we find no differences in results between the modified and unmodified models.  相似文献   

14.
This research used logistic regression to model item responses from a popular 360-degree-for-development survey used in a leadership development programme given to middle and upper level European managers in Brussels. The survey contained 106 items on 16 scales. The model used gender of ratee and rater group to identify items that exhibited differential item functioning (DIF). The rater groups were self, boss, peer, and direct report. The sample consisted of 356 survey families where a survey family consisted of a matched set of four surveys: one self, one boss, one peer, and one direct report. The sample contained 88% male and 12% female raters. The sample contained 1424 total surveys. The procedure for flagging items exhibiting differential functioning used effect size computed from Wald chi-square statistics rather than statistical significance, resulting in fewer flagged items. One item exhibited rating anomalies due to the gender of the ratee; 55 items exhibited DIF attributable to rater group. The apparent effect of the DIF was small with each item. An examination of the maximum likelihood parameter estimates suggested the rater group DIF was the result of either hierarchical complexity or organizational contingency. The DIF due to gender conformed to prior expectations of gender-related stereotypical interpretations. This research further suggested that DIF due to environmental complexity or organizational contingency could be a naturally occurring phenomenon in some 360-degree assessment, and that the interpretation of some 360-degree feedback could need to include the potential for such DIF to exist.  相似文献   

15.
Teacher stress and burnout are associated with many adverse outcomes for teachers, students, and the educational system. This paper describes the Coping-Competence-Context (3C) Theory of Teacher Stress. The theory is based on empirical research on teacher stress and coping highlighted within this special issue and attempts to more explicitly highlight three critical interconnected pathways to teacher stress development and intervention. The 3C model also highlights why teacher stress is important and should be the topic of future inquiry by showing clear links between teacher stress and adverse student and teacher outcomes. Lastly, this paper provides guidance for leverage points to intervene and describes a future research agenda in three domains: measurement, conceptual, and intervention issues and challenges.  相似文献   

16.
探讨了康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型(GR-MLFM)在包含被试及评分者层面预测变量(完整模型)下的返真性和适用性。结果表明:(1)GR-MLFM完整模型具有逻辑上和数理上的合理性,可用于主观题的评分情境,能较好地检测出评分者效应、影响因素及其影响程度;(2)在数学问题解决的评分实践中,评分员存在两种类型的评分倾向(宽松和严格效应),但绝大多数评分员的宽严度不明显;评分者的责任心可正向预测其严格程度,自信心可正向预测其宽松程度,而情绪稳定性和评分经验的预测作用不显著。  相似文献   

17.

Purpose

The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.

Design/Methodology/Approach

Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS® MSPR instrument.

Findings

Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.

Implications

The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.

Originality/Value

The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.  相似文献   

18.
Previous literature suggests that performance ratings are saturated with rater-related idiosyncratic variance. Given that modern psychometric theories relegate this source of variance to measurement error, it has not been the subject of much previous research. Of importance, identifying and estimating the variance components underlying idiosyncratic rater variance will inform our understanding of the nature of this variance. In a sample of managerial performance ratings we report on components of variance and find that the idiosyncratic rater variance component is about one third rater main effects variance, one third Rater × Ratee interaction effects variance, and one third upper-bound Rater × Ratee × Dimension interaction effects variance. Further, results indicate that variance components are moderated by the acquaintanceship time between the rater and the ratee.  相似文献   

19.
One of the central objectives of inclusive education, and education in general, is not only to support every students' academic learning, but also their social and emotional development. It therefore is important to identify difficulties in a child's socio-emotional development at school. The current study investigates students' emotional inclusion and social inclusion, as well as students' academic self-concept from four different perspectives using the Perceptions of Inclusion Questionnaire (PIQ). In particular, we analyzed the degree of agreement between teacher, mother, and father ratings with students' self-reports. Moreover, we tested if students' gender and special educational needs (SEN) are predictors for possible bias in parent and teacher reports. Survey participants included 721 Austrian, Grade 4 students from 48 classes. In addition, data from 46 teachers, 466 mother reports, and 375 father reports were included. We assessed the consistency (i.e., agreement) between the different raters by means of multitrait-multimethod analyses, or more precisely, a correlated trait–correlated method minus one (CT-C[M-1]) model. Results of the CT-C(M-1) analyses indicated a rather strong rater bias (i.e., method effects) for all three dimensions of inclusion. However, the consistency for academic self-concept was higher than for emotional and social inclusion. Furthermore, gender and SEN status affected rater bias, particularly for teacher reports. Results suggested that it matters who reports students' emotional inclusion, social inclusion, and academic self-concept, which has methodological and practical implications.  相似文献   

20.
对于评定耗时较长的测验来说,时间因素对评分精确性的影响不容忽视,因此,评分者漂移方面的研究备受关注。研究基于康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型建构出可用于检测评分者漂移的等级反应多水平评分者漂移模型,并通过模拟研究对模型性能进行验证。结果表明:模型能够精确估计项目和能力参数;且与固定效应模型相比,评分者随机效应模型能更有效地检测出评分者漂移效应,随机效应模型的有效性和稳定性更佳。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号