首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multifaceted data are very common in the human sciences. For example, test takers' responses to essay items are marked by raters. If multifaceted data are analyzed with standard facets models, it is assumed there is no interaction between facets. In reality, an interaction between facets can occur, referred to as differential facet functioning. A special case of differential facet functioning is the interaction between ratees and raters, referred to as differential rater functioning (DRF). In existing DRF studies, the group membership of ratees is known, such as gender or ethnicity. However, DRF may occur when the group membership is unknown (latent) and thus has to be estimated from data. To solve this problem, in this study, we developed a new mixture facets model to assess DRF when the group membership is latent and we provided two empirical examples to demonstrate its applications. A series of simulations were also conducted to evaluate the performance of the new model in the DRF assessment in the Bayesian framework. Results supported the use of the mixture facets model because all parameters were recovered fairly well, and the more data there were, the better the parameter recovery.  相似文献   

2.
国家公务员结构化面试中评委偏差的IRT分析   总被引:7,自引:1,他引:6  
孙晓敏  张厚粲 《心理学报》2006,38(4):614-625
使用项目反应理论(IRT)中的多面Rasch模型,对两组共12名评委在国家公务员结构化面试中的评委偏差进行了分析。提出并验证了两种评委偏差:评委之间在宽严程度上的差异和评委自身的一致性问题。结果发现:不同评委之间在宽严程度上差异显著,且不同评委评定行为的跨考生、跨维度、跨性别、跨时间的自身一致性也存在差异。研究表明,这种进入到评委个体层次的分析突破了经典测量理论(CTT)定位于评委群体进行分析的局限,针对每位评委的偏差行为提供了详细具体的诊断信息,从而为评委的针对性培训和评委库的建立提供了现代测量学的新方法  相似文献   

3.
Research studies in psychology and education often seek to detect changes or growth in an outcome over a duration of time. This research provides a solution to those interested in estimating latent traits from psychological measures that rely on human raters. Rater effects potentially degrade the quality of scores in constructed response and performance assessments. We develop an extension of the hierarchical rater model (HRM), which yields estimates of latent traits that have been corrected for individual rater bias and variability, for ratings that come from longitudinal designs. The parameterization, called the longitudinal HRM (L-HRM), includes an autoregressive time series process to permit serial dependence between latent traits at adjacent timepoints, as well as a parameter for overall growth. We evaluate and demonstrate the feasibility and performance of the L-HRM using simulation studies. Parameter recovery results reveal predictable amounts and patterns of bias and error for most parameters across conditions. An application to ratings from a study of character strength demonstrates the model. We discuss limitations and future research directions to improve the L-HRM.  相似文献   

4.
创造力测评中的评分者效应(rater effects)是指在创造性测评过程中, 由于评分者参与而对测评结果造成的影响.评分者效应本质上源于评分者内在认知加工的不同, 具体体现在其评分结果的差异.本文首先概述了评分者认知的相关研究, 以及评分者,创作者,社会文化因素对测评的影响.其次在评分结果层面梳理了评分者一致性信度的指标及其局限, 以及测验概化理论和多面Rasch模型在量化,控制该效应中的应用.最后基于当前研究仍存在的问题, 指出了未来可能的研究方向, 包括深化评分者认知研究,整合不同层面评分者效应的研究, 以及拓展创造力测评方法和技术等.  相似文献   

5.
In performance appraisals, some assessors are substantially more lenient than others. Research on this effect in appraisals involving communication and interaction between raters and ratees after the performance evaluation has taken place indicates that it may be at least partly caused by individual differences in assessor personality. However, little is known about the impact or causes of rater severity versus leniency in situations in which there is little or no contact between raters and ratees after the performance evaluation. In Study 1 (N = 174) the strength of the severity–leniency effect in this ‘no‐contact’ context is estimated and found to be similar to that reported for ‘with‐contact’ appraisals. No evidence of an association between assessor personality and assessor severity (vs. leniency) is found in the ‘no‐contact’ context. In Study 2 (N = 54) there is no evidence of an association between the fluid cognitive ability of assessors and the severity of their ratings in a no‐contact context. It is concluded that the severity versus leniency effect probably has a considerable impact on performance ratings in ‘no‐contact’ appraisal settings, but that neither rater personality nor rater cognitive ability appear to play a significant role in this.  相似文献   

6.
When analysts evaluate performance assessments, they often use modern measurement theory models to identify raters who frequently give ratings that are different from what would be expected, given the quality of the performance. To detect problematic scoring patterns, two rater fit statistics, the infit and outfit mean square error (MSE) statistics are routinely used. However, the interpretation of these statistics is not straightforward. A common practice is that researchers employ established rule-of-thumb critical values to interpret infit and outfit MSE statistics. Unfortunately, prior studies have shown that these rule-of-thumb values may not be appropriate in many empirical situations. Parametric bootstrapped critical values for infit and outfit MSE statistics provide a promising alternative approach to identifying item and person misfit in item response theory (IRT) analyses. However, researchers have not examined the performance of this approach for detecting rater misfit. In this study, we illustrate a bootstrap procedure that researchers can use to identify critical values for infit and outfit MSE statistics, and we used a simulation study to assess the false-positive and true-positive rates of these two statistics. We observed that the false-positive rates were highly inflated, and the true-positive rates were relatively low. Thus, we proposed an iterative parametric bootstrap procedure to overcome these limitations. The results indicated that using the iterative procedure to establish 95% critical values of infit and outfit MSE statistics had better-controlled false-positive rates and higher true-positive rates compared to using traditional parametric bootstrap procedure and rule-of-thumb critical values.  相似文献   

7.
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be “trained” using machine-learning techniques that incorporate human ratings. However, the quality of the human ratings used to train the AESEs is rarely examined. As a result, the impact of various rater effects (e.g., severity and centrality) on the quality of AESE-assigned scores is not known. In this study, we use data from a large-scale rater-mediated writing assessment to examine the impact of rater effects on the quality of AESE-assigned scores. Overall, the results suggest that if rater effects are present in the ratings used to train an AESE, the AESE scores may replicate these effects. Implications are discussed in terms of research and practice related to automated scoring.  相似文献   

8.
探讨了康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型(GR-MLFM)在包含被试及评分者层面预测变量(完整模型)下的返真性和适用性。结果表明:(1)GR-MLFM完整模型具有逻辑上和数理上的合理性,可用于主观题的评分情境,能较好地检测出评分者效应、影响因素及其影响程度;(2)在数学问题解决的评分实践中,评分员存在两种类型的评分倾向(宽松和严格效应),但绝大多数评分员的宽严度不明显;评分者的责任心可正向预测其严格程度,自信心可正向预测其宽松程度,而情绪稳定性和评分经验的预测作用不显著。  相似文献   

9.
采用多侧面Rasch模型对28位评委在托幼机构教育质量评价中的评委偏差进行了分析。分析结果显示:28名评委评分宽严度差异显著;3名评委内部一致性较差,其余25名评委内部一致性较稳定;评委与评价班级的交互作用不显著,与评价项目的交互作用显著。研究结果表明MFRM可以对托幼机构教育质量评价的评委偏差进行个体层面的具体分析,从项目反应理论的视角为托幼机构教育质量评价的评委针对性培训、评估评委的合格性从而建立合格评委库等提供现代教育、心理测量学依据。  相似文献   

10.
国内外考试改革和大型测评实践越来越强调主观题的作用,则评分者信度研究又重新成为一个备受关注的议题。研究在Wang和Liu(2007)的广义多水平侧面模型基础上,提出并探讨了等级反应多水平侧面模型。结果表明:在评分者固定效应和随机效应两种实验条件下,各偏差值的均值与标准差均较小,说明模型在当前实验条件下,各参数估计值的返真性和稳健性均较好,可以检测出评分者效应,由此,后续可进一步加入评分者效应的影响因素,使其发展为可同时检测评分者效应及其影响因素的完整模型。  相似文献   

11.
The present study examined the moderating effect of rater personality – extroversion and sensitivity to others – on the relations between selection interview ratings and measures of candidate self‐monitoring (SM) and social anxiety (SA). In a real‐life military selection procedure setting in which 445 candidates and 93 raters participated, rater extroversion moderated the relation between candidate SM and selection interview ratings so that this relation was negative for raters low on extroversion and positive for raters high on extroversion. Rater extroversion was also found to moderate the negative relation between candidate SA and selection interview ratings. No support was found for the moderating effect of rater sensitivity to others. An explanation of the moderating effect of rater extroversion based on the assumption that extroversion is negatively related to critical interpersonal sensitivity was suggested.  相似文献   

12.

Purpose  

The present study examined the effects of rater personality on the performance appraisal process. Specifically, we determined the relative weights that raters place on different performance dimensions when making overall performance evaluations, and examined whether rater personality influenced this weighting process. The literatures on social/political values and mate/friend selection were used as guiding frameworks in developing specific hypotheses.  相似文献   

13.

Purpose

The study specified an alternate model to examine the measurement invariance of multisource performance ratings (MSPRs) to systematically investigate the theoretical meaning of common method variance in the form of rater effects. As opposed to testing invariance based on a multigroup design with raters aggregated within sources, this study specified both performance dimension and idiosyncratic rater factors.

Design/Methodology/Approach

Data was obtained from 5,278 managers from a wide range of organizations and hierarchical levels, who were rated on the BENCHMARKS® MSPR instrument.

Findings

Our results diverged from prior research such that MSPRs were found to lack invariance for raters from different levels. However, same level raters provided equivalent ratings in terms of both the performance dimension loadings and rater factor loadings.

Implications

The results illustrate the importance of modeling rater factors when investigating invariance and suggest that rater factors reflect substantively meaningful variance, not bias.

Originality/Value

The current study applies an alternative model to examine invariance of MSPRs that allowed us to answer three questions that would not be possible with more traditional multigroup designs. First, the model allowed us to examine the impact of paramaterizing idiosyncratic rater factors on inferences of cross-rater invariance. Next, including multiple raters from each organizational level in the MSPR model allowed us to tease apart the degree of invariance in raters from the same source, relative to raters from different sources. Finally, our study allowed for inferences with respect to the invariance of idiosyncratic rater factors.  相似文献   

14.
This study investigates the effects of rater personality (Conscientiousness and Agreeableness), rating format (graphic rating scale vs. behavioral checklist), and the rating social context (face‐to‐face feedback vs. no face‐to‐face feedback) on rating elevation of performance ratings. As predicted, raters high on Agreeableness showed more elevated ratings than those low on Agreeableness when they expected to have the face‐to‐face feedback meeting. Furthermore, rating format moderated the relationship between Agreeableness and rating elevation, such that raters high on Agreeableness provided less elevated ratings when using the behavioral checklist than the graphic rating scale, whereas raters low on Agreeableness showed little difference in elevation across different rating formats. Results also suggest that the interactive effects of rater personality, rating format, and social context may depend on the performance level of the ratee. The implications of these findings will be discussed.  相似文献   

15.
Social cognition theory asserts that perceivers (raters) assign stimulus persons (ratees) to social categories. These categories help the raters encode, store, and recall information. In a longitudinal design that represented a performance appraisal situation, this study examined the effects of information about a ratee's category membership on the amount of information that raters collected about the ratee prior to rating. One hundred fourteen subjects participated in three separate experimental sessions which spanned a 3-week time period. Among other tasks, subjects were required to rate a subordinate who was described in a manner which made it either difficult or easy to assign the subordinate to a social category. It was predicted and found that raters of ratees who were easily categoriezed spent less time observing the ratees' performance than raters of ratees who were less easily classified. Furthermore, results indicated that it was the effect of rater categorization on observation time that was critical to rating accuracy.  相似文献   

16.
相对于其它评价中心技术而言,在无领导小组讨论中考官因素对评分结果的影响尤为重要.本研究主要探讨无领导小组讨论中新手考官的工作记忆与人格对其评分有效性的影响.结果发现,首先,新手考官的评分者一致性较低,评分准确度较差.其次,工作记忆和人格的部分因素分别从不同方面影响新手考官的评分有效性,具体表现在:(1)利他性越强,新手考官评分总均值的准确性越高,且评分结果越宽松;(2)新手考官的决断性越强,对所有应聘者做出有效区分的准确性越高;(3)新手考官的沉稳性越高,对各维度的区分越有效;(4)注意转换和抑制能力对新手考官的晕轮效应及其在各个维度上进行区分的准确度有抑制作用.  相似文献   

17.
18.
多面Rasch模型在结构化面试中的应用   总被引:1,自引:0,他引:1  
孙晓敏  薛刚 《心理学报》2008,40(9):1030-1040
使用项目反应理论中的多面Rasch模型,对66名考生在结构化面试中的成绩进行分析,剔除了由于评委等具体测量情境因素引入的误差对原始分数的影响,得到考生的能力估计值以及个体水平的评分者一致性信息。对基于考生能力估计值和考生面试分得到的决策结果进行比较,发现测量误差的确对决策造成影响,对个别考生的影响甚至相当巨大。进一步使用Facets偏差分析以及评委宽严程度的Facets分析追踪误差源。结果表明,将来自不同面试组的被试进行面试原始成绩的直接比较,评委的自身一致性和评委彼此之间在宽严程度上的差异均将导致误差。研究表明,采用Facets的考生能力估计值作为决策的依据将提高选拔的有效性。同时,Facets分析得到的考生个体层次的评分者一致性指标,以及评委与考生的偏差分析等研究结果还可以为面试误差来源的定位提供详细的诊断信息  相似文献   

19.
赵群  曹亦薇 《应用心理学》2006,12(3):258-263
档案袋评价因能充分发挥促进学生发展和教学改进的功能而受到青睐,但不佳的测评信度和效度限制了其在教学评价中的应用。本文对档案袋评分者信度的特点进行实证研究,4位评分者对152份档案袋进行了2次等级评分,运用多种统计方法计算评分者信度。结果表明,档案袋的评分有较高的关联性、中等偏弱的一致性和一定的稳定性,对档案袋整体水平的评分信度最高。本研究中,评分者个数为3时,对档案袋整体水平评分的概化系数和可靠性系数都在0.80以上。  相似文献   

20.
The validity of five work commitment concepts is assessed via content analysis. The role of rater naivety (i.e., familiarity) with the concepts and measures used is also evaluated. Organizational commitment and Protestant work ethic were found to be least redundant. Naive raters demonstrated more redundancy than raters familiar with the concepts and measures. The implications of these findings for the study of work commitment and organizational research in general are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号