首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 468 毫秒
1.
主观评分中多面Rasch模型的应用   总被引:1,自引:1,他引:0  
主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,介绍了国外相关的典型应用,并且讨论了该模型的应用条件。  相似文献   

2.
多面Rasch模型理论及其在结构化面试中的应用   总被引:1,自引:0,他引:1  
针对影响面试效度的各种误差来源,该文引入了一种新颖的面试结果处理方法:多面Rasch模型。这一模型在结构化面试中的应用不但有利于有效测量被试的能力水平,而且为识别问题评委、进一步完善评分规则、实现面试等值等问题都提供了全新的解决思路。文章在对结构化面试信、效度研究进展进行综述的基础上,介绍了多面Rasch模型的理论及其在结构化面试中的应用框架。  相似文献   

3.
关丹丹 《心理学探新》2014,34(5):437-440
为了评价和改进硕士研究生入学考试一般能力测试的写作评分,研究者采用概化理论和多面Rasch分析对113位考生的写作样本的评分误差来源、评分信度等进行了探讨.概化理论研究显示,评分者和题目对评分准确性影响不大,以两道写作题的考试设计而言,评分者为2人即可保证评分信度在0.75以上.多面Rasch分析显示,评分者宽严度的估计值及其误差均在可接受的范围内,评分者之间在宽严度上不存在显著差异,且评分者自身在评分时总体上比较稳定.但个别评分者在特定考生特定题目上表现出特殊偏向.概化理论和多面Rasch分析丰富了写作评分研究的量化指标,证实了硕士研究生入学考试一般能力测试的写作评分具有较高的信度.  相似文献   

4.
刘昊  刘肖岑  冯晓霞 《心理科学》2013,36(2):484-488
本研究的目的在于应用Rasch模型编制和分析数学入学准备测验,从而分析Rasch模型的有效性和优势。自编数学入学准备测试,对150名平均年龄为6.6岁的儿童进行测查,应用Rasch模型对题目和评分等级做出修正并分析结果。结果表明修正后的测试具有较好的信效度,较好地拟合了Rasch模型,评分等级设置合理,测试的整体难度相对较低。儿童的Rasch分数和性别无关,但受到年龄、家庭社会经济地位的影响。相对于经典测量理论而言,应用Rasch模型进行入学准备测试的编制和分析具有优势。  相似文献   

5.
创造力测评中的评分者效应(rater effects)是指在创造性测评过程中, 由于评分者参与而对测评结果造成的影响.评分者效应本质上源于评分者内在认知加工的不同, 具体体现在其评分结果的差异.本文首先概述了评分者认知的相关研究, 以及评分者,创作者,社会文化因素对测评的影响.其次在评分结果层面梳理了评分者一致性信度的指标及其局限, 以及测验概化理论和多面Rasch模型在量化,控制该效应中的应用.最后基于当前研究仍存在的问题, 指出了未来可能的研究方向, 包括深化评分者认知研究,整合不同层面评分者效应的研究, 以及拓展创造力测评方法和技术等.  相似文献   

6.
多维题组效应Rasch模型   总被引:2,自引:0,他引:2  
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。  相似文献   

7.
分别采用四维度和十五维度Rasch模型分析包含项目内多维度结构的科学测验数据,估计两种维度结构下维度分数的信度.结果表明,对比相应的单维模型而言,四维度与十五维度Rasch模型均能够极大提高各内容维度上分数估计的信度.四维度与十五维度Rasch模型拟合结果的比较表明,对于总长度固定的测验,维度数目的增加能够补偿子维度长度减少引起的信度损失.但是这一作用必须以维度间较高的相关性为前提.  相似文献   

8.
该研究应用GT和多面Rasch模型对结构化面试数据进行分析,并提出一些建议针对某辅导员招聘面试数据,运用GT从宏观上分析应聘者、考官和项目所带来的总体误差大小,在此基础上,运用多面Rasch模型从微观上进一步探查考官严厉度、应聘者能力差异、项目难易度及侧面偏差.结果表明:1)GT分析表明应聘者产生的变异较大(90.65%),说明面试可靠性较高,且当考官数为2时可靠性已较好.2)多面Rasch模型分析出了各侧面效应中的非拟合因素及交互效应中的偏差因素,表明面试误差主要来自考官间严厉度的差异及其自身一致性的不稳定。将GT与多面Rasch模型相结合分析面试数据不仅能测查出评价过程各方面的问题因素,并能更好地作整体把握。  相似文献   

9.
晏子 《心理科学进展》2010,18(8):1298-1305
Rasch模型是在国外学术界受到广泛关注和深入研究的一个潜在特质模型。该模型为解决心理科学领域内测量的客观性问题提供了一个可行性很高的解决方案。而国内关于Rasch模型的理论探讨和应用研究却并不多见。不同于一般项目反应理论, Rasch模型要求所收集的数据必须符合模型的先验要求, 而不是使用不同的参数去适应数据的特点。Rasch模型的主要特点(包括个体与题目共用标尺、线性数据、参数分离)确保了客观测量的实现。未来关于Rasch模型的研究方向包括多维度Rasch模型、测验的等值与链接、计算机自适应性考试, 大型应用测量系统(比如Lexile系统)等等。  相似文献   

10.
采用Rosenberg自尊量表(RSES)对425名在校大学生进行施测,应用项目反应理论的Rasch模型对项目指标进行分析及DIF检验。结果表明,Rosenberg自尊量表具有单维性,量表的信度为0.84; 除项目8以外,其他项目拟合指标良好,较适用来区分中等及偏低自尊水平的个体,项目功能差异检验发现在项目1和项目5上存在DIF,表现为男生自尊水平要高于女生。相对于经典测量理论,应用Rasch模型分析Rosenberg自尊量表具有优势,为进一步的完善和使用该自尊量表提供依据。  相似文献   

11.
朱宇  冯瑞龙  辛涛 《心理科学》2013,36(2):479-483
本研究以概化理论为视角,搜集了新HSK五级模拟书写题的作答和评分数据,估算了题型、题量、评卷员人数、评阅速度等潜在影响效应的方差分量,考察了新HSK书写成绩的可靠性,并探索了改善该分数可靠性的途径。基于概化理论和规划求解的数据分析发现了题量的调整方案以及题型、题量、评卷员人数的最优组合方案。本研究对评阅速度进行的分析属于前沿性的理论探索,而其他数据分析结果,则可能有益于旨在改进该测试质量的决策实践。  相似文献   

12.
While the Angoff (1971) is a commonly used cut score method, critics ( Berk, 1996; Impara & Plake, 1997 ) argue the Angoff places too‐high cognitive demands on raters. In response to criticisms of the Angoff, a number of modifications to the method have been proposed. Some suggested Angoff modifications include using an iterative rating process, presenting judges with normative data about item performance, revising the rating judgment into a Yes/No decision, assigning relative weights to dimensions within a test, and using item response theory in setting cut scores. In this study, subject matter expert raters were provided with a ‘difficulty anchored’ rating scale to use while making Angoff ratings; this scale can be viewed as a variation of the Angoff normative data modification. The rating scale presented test items having known p‐values as anchors, and served as a simple means of providing normative information to guide the Angoff rating process. Results are discussed regarding reliability of the mean Angoff rating (.73) and the correlation of mean Angoff ratings with item difficulty (observed r ranges from .65 to .73).  相似文献   

13.
Recent years have shown increased awareness of the importance of personality tests in educational, clinical, and occupational settings, and developing faking-resistant personality tests is a very pragmatic issue for achieving more precise measurement. Inspired by Stark (2002) and Stark, Chernyshenko, and Drasgow (2005), we develop a pairwise preference-based personality test that aims to measure multidimensional personality traits using a large-scale statement bank. An experiment compares the resistance of the developed personality test to faking with that of rating scale-based personality tests in the item response theory model framework. Results show that latent traits estimated from the personality test based on the rating scale method are severely biased, and that faking effect can be pragmatically ignored in the personality test developed based on the pairwise preference method.  相似文献   

14.
A multivariate, cognitive evaluation model was used to examine the previously noted effects of frame-of-reference (FOR) training on rating accuracy. The tested model emphasized two content-related issues associated with FOR training: (a) the content of the training itself and the extent that raters agreed with it, which provided an index of the amount of overlap between the theory of performance taught in training and a rater's implicit theory of performance following training; and (b) the content of raters' performance impressions of ratees based on a comprehensive and integrative model of the cognitive representations of persons— Associated Systems Theory (Carlston, 1992, 1994). Undergraduates (N= 172) were trained with FOR or control procedures using two different performance theories, observed and rated videotaped manager performance on three performance dimensions, and engaged in written free-recall of target performance vignettes. A structural model was tested incorporating a latent differential rating accuracy construct. FOR training was associated with higher demonstrated levels of agreement and with changes in ratee representation that were more abstract and more target-referent (i.e., less idiosyncratic) than control training. Agreement was found to be associated with enhanced rating accuracy and more self-referent impressions; however, target-referent impressions were shown to be related to better rating accuracy. Discussion focuses on the interrelated nature of content and process issues in performance appraisal. Using the AST framework, it is also proposed that longstanding manager/subordinate dyads could be expected to benefit the most from FOR training.  相似文献   

15.
Two different item response theory model frameworks have been proposed for the assessment and control of response styles in rating data. According to one framework, response styles can be assessed by analysing threshold parameters in Rasch models for ordinal data and in mixture‐distribution extensions of such models. A different framework is provided by multi‐process item response tree models, which can be used to disentangle response processes that are related to the substantive traits and response tendencies elicited by the response scale. In this tutorial, the two approaches are reviewed, illustrated with an empirical data set of the two‐dimensional ‘Personal Need for Structure’ construct, and compared in terms of multiple criteria. Mplus is used as a software framework for (mixed) polytomous Rasch models and item response tree models as well as for demonstrating how parsimonious model variants can be specified to test assumptions on the structure of response styles and attitude strength. Although both frameworks are shown to account for response styles, they differ on the quantitative criteria of model selection, practical aspects of model estimation, and conceptual issues of representing response styles as continuous and multidimensional sources of individual differences in psychological assessment.  相似文献   

16.
The purpose of this study was to approach the issue of rating ability by examining the influence of rater implicit theories and rater intelligence on rating outcomes. Using the inferential accuracy model (Jackson, 1972), raters were identified as either possessing a normative or idiosyncratic implicit theory of the occupation of college instructor. In a laboratory setting, 50 normative and 50 idiosyncratic raters judged the videotaped performance of either a good or poor lecturer. Results showed that (a) intelligence was positively related to rating accuracy and to possessing a normative implicit theory, (b) rater type moderated the relationship between intelligence and rating accuracy, and (c) controlling for intelligence, normative raters committed stronger halo effects than idiosyncratic raters. These results were discussed in relation to furthering the understanding of rating ability.  相似文献   

17.
认知诊断模型选择是认知诊断评估中重要研究问题之一。在实际应用中实践者并不知道真正拟合数据的模型,通常会用模型拟合指标检验模型与数据的拟合程度。从测量结果质量来看,除保证模型与数据拟合之外,还需要重点评价模型诊断结果的信度和效度等。考虑到以往研究大都采用基于信息量的拟合指标去判定模型与数据的匹配性,本研究提出综合考虑模型拟合指标与信度指标用于模型选择或评价模型误设。考虑实验因素为真实模型或分析模型(DINA模型、G-DINA模型、R-RUM模型)、样本量、题量和属性个数,在五因素(3×3×2×2×2)实验设计条件下,比较Bootstrap区间估计的属性分类一致性信度平均数与标准误和常用的拟合统计量-2LL、AIC、BIC对正确模型的选择率。结果表明:-2LL在题目数量多的情况下表现较好,而AIC、BIC在被试量较大的情况下表现较好,在不同的研究条件下,-2LL、AIC、BIC的模型选择率很不稳定,而用Bootstrap法估计的属性分类一致性信度平均数和标准误在不同研究条件的模型选择率较稳定,总体表现较好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号