首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 62 毫秒
1.
简小珠  戴步云  戴海琦 《心理学报》2016,48(12):1625-1630
试题难度、试题考查重要性程度加权是多级记分试题的两个基本属性, 因而在IRT项目特征函数中需用不同参数来表示。以往多级记分模型用多个难度参数来描述多级记分试题的难度, 不能有效的表达多级记分试题的分数权重作用。从多级记分试题的分数加权作用角度, 本文提出Logistic加权模型并论述了理论构建思想。在Logistic加权模型下对项目参数估计的EM算法进行推导并编写了相应的参数估计程序。在Logistic加权模型下进行测验模拟, 发现项目参数估计的模拟返真性能良好。  相似文献   

2.
程小扬  丁树良 《心理科学》2011,34(4):965-969
摘要: 在计算机自适应测验中, 对0-1评分模型按a-分层选题是高效安全的策略,但多级评分模型的项目难度/步骤参数有多个而无法直接应用这种选题策略。信息函数能够很好地综合项目所有参数及能力参数,但最大信息量选题策略会影响考试安全。本文提出一种变加权选题策略,它通过调用一个与信息量相关联的函数,该函数与信息量成正比,与区分度的某个幂函数成反比,从而达到既能综合项目所有参数又按a分层的效果。在GPCM模型下用蒙特卡罗实验进行比较研究,结果显示新的选题策略总体效果比已有相关结果好。  相似文献   

3.
肖涵敏  杜文久  张婷婷 《心理学报》2011,43(12):1462-1467
多级评分项目由于可以提供更多关于被试的信息而被广泛的使用。本文首先通过引用一个多级评分的数学试题, 给出了项目节点这一概念。假设被试在项目节点上的正确反应概率为二参数逻辑斯蒂模型之下, 本文通过分析三种不同类型的多级评分项目, 得出了三个评分模型, 其中一个和等级反应模型在形式上是一样的。鉴于我国目前考试测量所使用的多级评分项目的形式, 可以运用本文所述的项目节点的方法将项目评分模型统一提出。  相似文献   

4.
项目反应理论框架下多级评分项目的信息函数   总被引:1,自引:0,他引:1  
杜文久 《心理学报》2006,38(1):135-144
目的是给出多级评分项目的信息函数计算公式,同时通过几个实例讨论了多级评分项目信息函数在实践中的应用。主要取得了如下成果:(1)首先通过一个例子给出了测验项目的样本空间;(2)以二参数逻辑斯蒂模型为基础,讨论了几种多级评分项目的概率函数,并在此基础上给出了多级评分项目的信息函数计算公式;(3)通过几个实例讨论了多级评分项目信息函数在实践中的应用  相似文献   

5.
等级反应模型下计算机化自适应测验选题策略   总被引:7,自引:3,他引:4  
陈平  丁树良  林海菁  周婕 《心理学报》2006,38(3):461-467
计算机化自适应测验(CAT)中的选题策略,一直是国内外相关学者关注的问题。然而对多级评分的CAT的选题策略的研究却很少报导。本研究采用计算机模拟程序对等级反应模型(Graded Response Model)下CAT的四种选题策略进行研究。研究表明:等级难度值与当前能力估计值匹配选题策略的综合评价最高;在选题策略中增设 “影子题库”可以明显提高项目调用的均匀性;并且不同的项目参数分布或不同的能力估计方法都对CAT评价指标有影响  相似文献   

6.
多分属性认知诊断模型(CDMs)比传统的二分属性CDMs提供更详细的诊断反馈信息,但现有大部分多分属性CDMs并不具备直接分析多级(或混合)评分数据的功能。本文基于等级反应模型对重参数化多分属性DINA模型进行多级评分拓广,开发一个可处理多级评分数据的等级反应多分属性DINA模型。首先通过实证数据分析呈现新模型的现实可应用性;然后通过模拟研究探究新模型的参数估计返真性。结果表明,新模型满足同时处理多分属性和多级评分数据的现实需求;且具备良好的心理计量学性能,但对测验质量有一定要求(e.g., 题目质量较高且测验Qp矩阵具有完备性等)。  相似文献   

7.
项目反应理论等级反应模型项目信息量   总被引:7,自引:1,他引:6  
信息函数作为项目反应理论中的一个重要概念,在进行项目和测验分析的工作中,以及在指导测验编制的工作中,有着非常重要的应用价值。信息函数的应用在计算机化自适应测验中更是重中之重,也受到最大关注。然而,关于多级记分项目信息函数特性的研究还比较少。本研究模拟了被试特质水平参数数据和项目参数数据,其中被试特质水平参数生成了121个被试特质水平参数点,项目参数生成了4批不同区分度参数数据,每批数据有126个不同难度等级参数组合模式的项目,每个项目有5个难度等级。通过数据分析后发现,等级反应模型项目提供最大信息量所对应的被试特质水平,是与该项目几个相互临近的难度等级组相适应,既不是只与其中一个难度等级对应,也不一定是与所有难度等级对应。本研究称这种规律为“临近难度等级占优”。这个发现无疑对测验质量分析和测验编制工作,包括计算机化自适应测验编制,具有重要的指导意义  相似文献   

8.
在测量具有层阶结构的潜质时, 标准项目反应模型对项目参数估计和能力参数估计都具有较低的效率, 多维项目反应模型虽然在估计第一阶潜质时具有高效性, 但没有考虑到潜质层阶的情况, 所以它不适合用来处理具有层阶结构的潜质; 而高阶项目反应模型在处理这种具有层阶结构的潜质时, 不仅能够高效准确地对项目参数和能力参数进行估计, 而且还能同时获得高阶潜质与低阶潜质。目前存在的高阶项目反应模型有高阶DINA模型、高阶双参数正态肩型层阶模型、高阶逻辑斯蒂模型、多级评分的高阶项目反应模型和高阶题组模型。未来对高阶项目反应模型的研究方向应注意多水平高阶项目反应模型、项目内多维情况下的高阶项目反应模型以及高阶认知诊断模型。  相似文献   

9.
本文对具有较好发展前景的HO-DINA模型进行拓展,将仅适用于0-1评分题型的HO-DINA模型拓广至可用于多级评分题型,采用MCMC算法实现了对模型参数的估计,并对新模型性能进行了研究。研究发现: (1)本文拓展的多级评分HO-DINA模型参数估计精度较高且诊断正确率较高。(2)多级评分的HO-DINA模型诊断的属性个数越多,属性参数( 和 )和s参数估计的精度越差、属性诊断的正确率(MMR和PRM)越低,但能力参数( )和g参数的估计精度反而越高。(3)在当前条件下,若想保证属性模式判准率在80%以上,建议诊断的属性个数不宜超过7个。  相似文献   

10.
项目反应理论框架下的新等值方法——对数对比等值法   总被引:3,自引:2,他引:1  
项目反应理论有一些以除法形式给出的多级评分模型,若采用Haebara等值法、Stocking_Lord等值法或对称相对熵等值法进行测验等值,都因其对初值有较高要求而可能导致失败。针对这一类模型,我们给出了一种新的等值方法——对数对比等值法。这种方法收敛快,对迭代初值要求低,所得结果精度较高,可以为其他等值方法提供良好的初值。研究表明,对数对比等值法还改进和推广了0-1评分的两参数Logistic模型的Logit变换等值法  相似文献   

11.
Abstract:  In test operations using IRT (item response theory), items are included in a test before being used to rate subjects and the response data is used to estimate their item parameters. However, this method of test operation may lead to item content leakage and an adequate test operation can become difficult. To address this problem, Ozaki and Toyoda (2005, 2006 ) developed item difficulty parameter estimation methods that use paired comparison data from the perspective of the difficulty of items as judged by raters familiar with the field. In the present paper, an improved method of item difficulty parameter estimation is developed. In this new method, an item for which the difficulty parameter is to be estimated is compared with multiple items simultaneously, from the perspective of their difficulty. This is not a one-to-one comparison but a one-to-many comparison. In the comparisons, raters are informed that items selected from an item pool are ordered according to difficulty. The order will provide insight to improve the accuracy of judgment.  相似文献   

12.
Pseudo-guessing parameters are present in item response theory applications for many educational assessments. When sample size is not sufficiently large, the guessing parameters may be ignored from the analysis. This study examines the impact of ignoring pseudo-guessing parameters on measurement invariance analysis, specifically, on item difficulty, item discrimination, and mean and variance of ability distribution. Results show that when non-zero guessing parameters are ignored from the measurement invariance analysis, item discrimination estimates tend to decrease particularly for more difficult items, and item difficulty estimates decrease unless the items are highly discriminating and difficult. As the guessing parameter increases, the size of the decrease in item discrimination and difficulty tends to increase, and the estimated mean and variance of ability distribution tend to be inaccurate. When two groups have heterogeneous ability distributions, ignoring the guessing parameter affects the reference group and the focal group differently. Implications of result findings are discussed.  相似文献   

13.
计算机化自适应测验中原始题项目参数的估计   总被引:1,自引:1,他引:0  
计算机化自适应测验(Computerized Adaptive Testing, 简称CAT)其安全性面临着新的挑战, 小题库的安全更受威胁。如何建设一个大型、优质的题库成为CAT研究中一个非常重要的课题。目前CAT题库的建设存在一些问题, 如成本高且保密性较差。尤其是等值技术较复杂且锚题重复使用容易造成泄露。如能在实施CAT过程中插入未经过参数估计的项目(原始题), 同时对原始题项目参数进行估计, 这对建设大型、优质的CAT题库来说其意义是不言而喻的。本文基于1PLM和2PLM对此进行研究, 提出了原始题在线估计的新方法以及推导出了求区分度参数a迭代初值的计算公式。研究结果表明:无论是模拟研究还是实证研究, 原始题被作答的次数对项目参数估计结果都会产生不同的影响, 并且原始题作答人数越多项目参数估计精度也越高。  相似文献   

14.
A linking design typically consists of a data collection procedure together with an item linking procedure that places item parameters calibrated from multiple test forms onto a common scale. This study considered 2 potentially useful item response theory linking designs. The first one is characterized by selecting a single set of common items across all multiple test forms, the precalibrated item parameters of which are kept fixed while the unknown parameters of the other items are being estimated. This linking design will be referred to as the fixed common-precalibrated item parameter design. However, data collected under this design could also be analyzed by the characteristic curve method, which constituted an alternative linking procedure. In this study, the relative merits of the 2 linking designs were examined with respect to their robustness against 3 manipulated conditions-namely, when the common items have imprecise estimates, when there is a noticeable difference in the average item difficulty between the common and the noncommon items, and when the examinees are heterogeneous in terms of their abilities. A parameter recovery study was conducted to achieve this purpose. The results indicated that both linking designs were capable of producing accurate linking of items and equivalent estimation of ability parameters under the 3 conditions. When the 2 designs were actually utilized in the development of an item bank, it was found that both linking designs produced quite consistent solutions despite minor differences on some item and ability estimates. Condition under which a linking design is preferred over the other is also provided in the Discussion section of this article.  相似文献   

15.
多级评分题计算机自适应测验选题策略比较   总被引:12,自引:2,他引:10  
研究比较了多级评分题计算机化自适应测验五种选题策略的优劣。应用的IRT模型是Samejima的等级反应模型。参加比较的选题策略有难度均值与能力匹配法、难度中值与能力匹配法、信息量最大法和两种A分层法。比较指标采用了能力估计值返回真值偏差、能力估计标准差、人均用题数和试题调用次数标准差四个。研究采用蒙特卡罗模拟法,结果显示每种方法各有优劣,在分层得当情况下,A分层法(中)的综合效果最佳  相似文献   

16.
Various definitions and different approaches for assessing the complex construct of parental involvement (PI) have led to inconsistent findings regarding the impact of PI on child development. To date, limited information is available regarding the measurement invariance of PI measures across time and groups (e.g., children’s gender, ethnicity, and socio-economic status), leaving a concern that group differences in PI might reflect item bias instead of true differences in PI. The present study aimed to obtain a set of optimal items for measuring PI from kindergarten through the elementary school years and investigate whether they could be used for parents from different groups. A Rasch measurement model was implemented to investigate item difficulty, step calibrations, and measurement invariance (differential item functioning; DIF, here). The results from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999 data set showed that 20 items can be used to measure three dimensions of PI—namely school/home involvement, family educational investment, and family routines—across four time points. Administrative time, children’s gender, ethnicity, and social economic status showed different levels of effect on item difficulty for half of these items. Practitioners and researchers should be cautious when using these items and are suggested to freely estimate the item parameters of DIF items as well as add more items to the PI scale to improve reliability.  相似文献   

17.
目前参数估计多采用统计方法,存在耗时长、要求被试样本容量大和项目数多等缺点。本文将BP神经网络和降维法相结合,对GRM的项目参数和考生能力参数进行估计。蒙特卡洛模拟结果显示:(1)不管是人多题少还是题多人少,该网络设计下的参数估计精度都较高;(2)可以应用到多个不同等级评分的参数估计中,甚至是超过15个等级的项目参数,估计精度也较高,这是其他参数估计方法所不可比拟的;(3)运行的时长和统计估计方法相比大大缩减。  相似文献   

18.
In this paper, we study the identification of a particular case of the 3PL model, namely when the discrimination parameters are all constant and equal to 1. We term this model, 1PL-G model. The identification analysis is performed under three different specifications. The first specification considers the abilities as unknown parameters. It is proved that the item parameters and the abilities are identified if a difficulty parameter and a guessing parameter are fixed at zero. The second specification assumes that the abilities are mutually independent and identically distributed according to a distribution known up to the scale parameter. It is shown that the item parameters and the scale parameter are identified if a guessing parameter is fixed at zero. The third specification corresponds to a semi-parametric 1PL-G model, where the distribution G generating the abilities is a parameter of interest. It is not only shown that, after fixing a difficulty parameter and a guessing parameter at zero, the item parameters are identified, but also that under those restrictions the distribution G is not identified. It is finally shown that, after introducing two identification restrictions, either on the distribution G or on the item parameters, the distribution G and the item parameters are identified provided an infinite quantity of items is available.  相似文献   

19.
Fischer's (1973) linear logistic test model can be used to test hypotheses regarding the effect of covariates on item difficulty and to predict the difficulty of newly constructed test items. However, its assumptions of equal discriminatory power across items and a perfect prediction of item difficulty are never absolutely met. The amount of misfit in an application of a Bayesian version of the model to two subtests of the SON‐R –17 is investigated by means of item fit statistics in the framework of posterior predictive checks and by means of a comparison with a model that allows for residual (co)variance in the item parameters. The effect of the degree of residual (co)variance on the robustness of inferences is investigated in a simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号