首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 59 毫秒
采用项目反应理论(IRT)的多侧面Rasch模型(MFRM),分析评价中心技术中无领导小组讨论(LGD)的测评结果,探讨被试能力水平、评委评分宽严度、评分内部一致性、维度难度和评定等级等问题,进而讨论各种偏差。通过 MFRM 分析人事测评结果,可深入了解被试能力的真实差异、甑别维度难度、探查测评误差源,从而完善测评试题编制、评估或诊断评委合格性、提高测评维度与测评目的匹配性,为拓展项目反应理论在人事测评中的应用提供独特视角。  相似文献   

国家公务员结构化面试中评委偏差的IRT分析   总被引:7,自引:1,他引:6  
孙晓敏  张厚粲 《心理学报》2006,38(4):614-625
使用项目反应理论(IRT)中的多面Rasch模型,对两组共12名评委在国家公务员结构化面试中的评委偏差进行了分析。提出并验证了两种评委偏差:评委之间在宽严程度上的差异和评委自身的一致性问题。结果发现:不同评委之间在宽严程度上差异显著,且不同评委评定行为的跨考生、跨维度、跨性别、跨时间的自身一致性也存在差异。研究表明,这种进入到评委个体层次的分析突破了经典测量理论(CTT)定位于评委群体进行分析的局限,针对每位评委的偏差行为提供了详细具体的诊断信息,从而为评委的针对性培训和评委库的建立提供了现代测量学的新方法  相似文献   

该研究应用GT和多面Rasch模型对结构化面试数据进行分析,并提出一些建议针对某辅导员招聘面试数据,运用GT从宏观上分析应聘者、考官和项目所带来的总体误差大小,在此基础上,运用多面Rasch模型从微观上进一步探查考官严厉度、应聘者能力差异、项目难易度及侧面偏差.结果表明:1)GT分析表明应聘者产生的变异较大(90.65%),说明面试可靠性较高,且当考官数为2时可靠性已较好.2)多面Rasch模型分析出了各侧面效应中的非拟合因素及交互效应中的偏差因素,表明面试误差主要来自考官间严厉度的差异及其自身一致性的不稳定。将GT与多面Rasch模型相结合分析面试数据不仅能测查出评价过程各方面的问题因素,并能更好地作整体把握。  相似文献   

多面Rasch模型在结构化面试中的应用   总被引:1,自引:0,他引:1  
孙晓敏  薛刚 《心理学报》2008,40(9):1030-1040
使用项目反应理论中的多面Rasch模型,对66名考生在结构化面试中的成绩进行分析,剔除了由于评委等具体测量情境因素引入的误差对原始分数的影响,得到考生的能力估计值以及个体水平的评分者一致性信息。对基于考生能力估计值和考生面试分得到的决策结果进行比较,发现测量误差的确对决策造成影响,对个别考生的影响甚至相当巨大。进一步使用Facets偏差分析以及评委宽严程度的Facets分析追踪误差源。结果表明,将来自不同面试组的被试进行面试原始成绩的直接比较,评委的自身一致性和评委彼此之间在宽严程度上的差异均将导致误差。研究表明,采用Facets的考生能力估计值作为决策的依据将提高选拔的有效性。同时,Facets分析得到的考生个体层次的评分者一致性指标,以及评委与考生的偏差分析等研究结果还可以为面试误差来源的定位提供详细的诊断信息  相似文献   

主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,介绍了国外相关的典型应用,并且讨论了该模型的应用条件。  相似文献   

关丹丹 《心理学探新》2014,34(5):437-440
为了评价和改进硕士研究生入学考试一般能力测试的写作评分,研究者采用概化理论和多面Rasch分析对113位考生的写作样本的评分误差来源、评分信度等进行了探讨.概化理论研究显示,评分者和题目对评分准确性影响不大,以两道写作题的考试设计而言,评分者为2人即可保证评分信度在0.75以上.多面Rasch分析显示,评分者宽严度的估计值及其误差均在可接受的范围内,评分者之间在宽严度上不存在显著差异,且评分者自身在评分时总体上比较稳定.但个别评分者在特定考生特定题目上表现出特殊偏向.概化理论和多面Rasch分析丰富了写作评分研究的量化指标,证实了硕士研究生入学考试一般能力测试的写作评分具有较高的信度.  相似文献   

HSK主观考试评分的Rasch实验分析   总被引:1,自引:0,他引:1  
主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,设计了基于该模型的HSK主观考试评分质量控制应用框架,利用HSK作文评分数据进行了实验验证。  相似文献   

国内外考试改革和大型测评实践越来越强调主观题的作用,则评分者信度研究又重新成为一个备受关注的议题。研究在Wang和Liu(2007)的广义多水平侧面模型基础上,提出并探讨了等级反应多水平侧面模型。结果表明:在评分者固定效应和随机效应两种实验条件下,各偏差值的均值与标准差均较小,说明模型在当前实验条件下,各参数估计值的返真性和稳健性均较好,可以检测出评分者效应,由此,后续可进一步加入评分者效应的影响因素,使其发展为可同时检测评分者效应及其影响因素的完整模型。  相似文献   

分别采用四维度和十五维度Rasch模型分析包含项目内多维度结构的科学测验数据,估计两种维度结构下维度分数的信度.结果表明,对比相应的单维模型而言,四维度与十五维度Rasch模型均能够极大提高各内容维度上分数估计的信度.四维度与十五维度Rasch模型拟合结果的比较表明,对于总长度固定的测验,维度数目的增加能够补偿子维度长度减少引起的信度损失.但是这一作用必须以维度间较高的相关性为前提.  相似文献   

探讨了康春花,孙小坚和曾平飞(2016)提出的等级反应多水平侧面模型(GR-MLFM)在包含被试及评分者层面预测变量(完整模型)下的返真性和适用性。结果表明:(1)GR-MLFM完整模型具有逻辑上和数理上的合理性,可用于主观题的评分情境,能较好地检测出评分者效应、影响因素及其影响程度;(2)在数学问题解决的评分实践中,评分员存在两种类型的评分倾向(宽松和严格效应),但绝大多数评分员的宽严度不明显;评分者的责任心可正向预测其严格程度,自信心可正向预测其宽松程度,而情绪稳定性和评分经验的预测作用不显著。  相似文献   

This study explored how early childhood teachers (n=5) and young children (n=174) (age range 7 to 8 years; males = 81; females = 93) in two primary schools constructed and interpreted the right to HIV/AIDS education. Data were captured using individual interviews with teachers and group interviews with young children. Analysis of the data showed that teachers viewed young children's right to health information positively but did not consider the right to sexual information. Teachers operated within discourses which upheld the image of the child as innocent requiring protection from sexual knowledge. Children's perceptions of their rights to knowledge of sex in HIV/AIDS education showed ambiguity. Some accepted the right to know whilst others felt that knowledge about HIV/AIDS was inconsistent with childhood innocence.  相似文献   

Mild traumatic brain injury (mTBI) is a leading cause of injury among children, with approximately 15% of children experiencing a TBI prior to 15 years of age. Acutely, mTBI has been associated with a range of cognitive, physical, emotional and behavioural impairments. However, few studies have examined outcomes beyond five years post injury, long before the developmental process is complete and the full extent of any deficits may manifest. Our group had the unique opportunity to use data from a longitudinal birth cohort of 1265 children (Christchurch Health and Development Study) to examine the long term outcomes of early injury (0–5 years). Information about these children, including mTBI events, had been collected at birth, 4 months and at yearly intervals until age 16, and again at ages 18, 21 and 25 years. We found that even after statistical control for a wide range of child and family confounds, children who had been hospitalized for an mTBI had increased inattention/hyperactivity and conduct as rated by mothers and teachers over ages 7–13 years. Increased rates of psychiatric disorders were over 14–16 years for those injured in the preschool, including symptoms consistent with Attention Deficit/Hyperactivity Disorder (ADHD), Odds Ratio = 4.6, Conduct Disorder (CD), Odds Ratio = 5.6 and Substance Abuse (Odds Ratio = 9.1). Over ages 21–25 ongoing behaviour problems were assessed using self‐reported arrests, violent offenses and property offenses. Compared to non‐injured individuals, mTBI groups were more likely to be arrested, involved in property, and violent offences. We controlled for a wide range of factors and there was still clear evidence of ongoing problems for individuals who had experienced a mTBI compared to their non injured counterparts. These findings provide compelling evidence of long term psychosocial and psychiatric outcomes following mTBI.  相似文献   


Existing studies examining the development of temporal order memory show that although young children perform above chance on some tasks assessing temporal order memory, there are significant age-related differences across childhood. Yet, the trajectory of children’s ability to retrieve temporal order remains unclear as existing conclusions are drawn from cross-sectional studies. The present study utilized an accelerated longitudinal design in order to characterize the developmental trajectory of temporal order memory in a sample of 200 healthy 4- to 8-year-old children. Specifically, two tasks commonly used in the literature were tested longitudinally: a primacy judgment task and an ordering task. Results revealed that, even after controlling for differences in IQ, linearly increasing trajectories characterized age-related change in performance for both tasks; however, change appeared greater for the temporal ordering task. Further, performance on the two tasks was positively related, suggesting shared underlying mechanisms. These findings provide a more thorough understanding of temporal order memory in early to middle childhood by characterizing the developmental trajectories of two commonly used tasks and have important implications for our understanding of children’s developing memory more broadly.  相似文献   

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be “trained” using machine-learning techniques that incorporate human ratings. However, the quality of the human ratings used to train the AESEs is rarely examined. As a result, the impact of various rater effects (e.g., severity and centrality) on the quality of AESE-assigned scores is not known. In this study, we use data from a large-scale rater-mediated writing assessment to examine the impact of rater effects on the quality of AESE-assigned scores. Overall, the results suggest that if rater effects are present in the ratings used to train an AESE, the AESE scores may replicate these effects. Implications are discussed in terms of research and practice related to automated scoring.  相似文献   

We tested whether individual differences in a component of early conscience mediated relations between parental discipline and externalizing behavior problems in 238 3.5-year-olds. Parents contributed assessments of discipline practices and child moral regulation. Observations of children's behavioral restraint supplemented parental reports. Parents and teachers reported on child externalizing symptoms. Parental induction, warm responsiveness, and less frequent use of physical punishment generally were associated with higher levels of moral regulation and fewer externalizing problems. Moreover, moral regulation partially mediated relationships between discipline and externalizing symptoms, with the clearest case of mediation involving induction. However, relationships were found for boys only. Results support a mediation model wherein inductive and physical discipline may influence the expression of boys' externalizing behavior through effects on conscience. Finally, results suggest that different developmental processes may be associated with early externalizing problems in boys and girls, and confirm that fathers' reports contribute to our understanding of the origins of child externalizing problems.  相似文献   

临床思维的逻辑性   总被引:2,自引:0,他引:2  
临床思维是临床能力的核心和基础,是成为一名合格医生的前提条件。临床思维是按照逻辑规律反映疾病的思维方式。在临床思维活动过程中最重要的就是医学假说的提出、验证、推理和遵守逻辑思维的过程。逻辑思维能力对于医学工作者是至关重要的,作为医学工作者应该不断加强逻辑修养,不断提升我们临床思维水准。  相似文献   

基于压力认知失眠模型,采用网络欺凌/受欺凌问卷、青少年社交焦虑问卷、流调中心抑郁问卷和匹兹堡睡眠质量指数问卷,对582名中小学生进行三次追踪测查,每次间隔半年,并运用结构方程模型考察社交焦虑和抑郁情绪在网络欺凌/受欺凌对睡眠质量的影响的中介作用及其性别差异。结果表明:(1)社交焦虑和抑郁情绪在网络欺凌/受欺凌影响睡眠质量的路径中均起到链式中介作用;(2)网络欺凌/受欺凌影响睡眠质量的链式中介模型存在性别差异。这启示要改善网络欺凌者与受欺凌者的睡眠质量,应关注网络欺凌给他们带来的情绪困扰,只有从根本上解决其情绪问题,才能最终有效促进其睡眠质量。  相似文献   

本研究以小学高年级学生为被试,选取高熟悉度的具体名词组成的联结词对为实验材料,考察不同判断模式下学习判断的准确性及预见偏差。研究发现:(1)小学高年级学生的学习判断绝对准确性存在年级差异。总体来看,小学六年级学生在即时判断和延迟判断模式下都具有较好的绝对准确性,而四年级和五年级学生在即时判断时出现显著的高估,而在延迟判断模式下具有较好的绝对准确性。具体分析正向词对和反向词对的准确性发现,在即时判断模式下,五、六年级学生的正向联结词对有较好的准确性,而四年级的正向联结词对出现高估;三个年级的学生的反向词对都出现高估。在延迟判断模式下,三个年级学生在正向词对和反向词对上都有较好的准确性。(2)小学五年级学生的学习判断开始出现预见偏差。(3)延迟判断能够提高小学高年级学生的学习判断准确性,减小甚至消除预见偏差。  相似文献   

Background: The conventional question (CQ) on subjective well-being (SWB) is e.g. “How is life?”, with ratings between e.g. ‘Best’ and ‘Worst possible’. Disadvantages may be casualness of responses and biases of proximate, peer or cultural relativity. Alternatively, with Anamnestic Comparative Self-Assessment (ACSA), the scale anchors are the respondents’ self-defined memories of their best and worst periods in life. Thus ACSA uses life review and experiential scale anchors. Objective: To compare the validity, sensitivity and responsiveness of the CQ and ACSA. Method: ACSA and the CQ were administered in parallel to 2584 university-hospital patients suffering from a wide range of psychiatric and somatic diseases. Results: ACSA and CQ did not measure the same construct (r = 0.50). CQ ratings were almost normally distributed, whereas ACSA ratings were overall lower, and clearly positively skewed, suggesting greater sensitivity to the respondents’ diseased state. Contrary to CQ, ACSA ratings of critically ill patients with end-stage liver disease were very low. After life-saving liver transplantation, ACSA ratings increased significantly more than CQ ratings, suggesting better responsiveness of ACSA to objective change. Trait-like socio-demographic variables such as sex, age, and marital status influenced CQ, but not ACSA ratings. Conclusion: In between-subject studies, depending on one’s study objectives, ACSA should be considered as a complement or an alternative to conventional SWB instruments. The CQ is probably preferable when socio-demographic variables are study endpoints. In longitudinal or intervention studies and for intercultural comparisons, ACSA, which reduces the need for correction of several biases or confounders, seems more useful.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号