首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Wendy M. Yen 《Psychometrika》1985,50(4):399-410
When the three-parameter logistic model is applied to tests covering a broad range of difficulty, there frequently is an increase in mean item discrimination and a decrease in variance of item difficulties and traits as the tests become more difficult. To examine the hypothesis that this unexpected scale shrinkage effect occurs because the items increase in complexity as they increase in difficulty, an approximate relationship is derived between the unidimensional model used in data analysis and a multidimensional model hypothesized to be generating the item responses. Scale shrinkage is successfully predicted for several sets of simulated data.The author is grateful to Robert Mislevy for kindly providing a copy of his computer program, RESOLVE.  相似文献   

2.
A pplications of standard item response theory models assume local independence of items and persons. This paper presents polytomous multilevel testlet models for dual dependence due to item and person clustering in testlet‐based assessments with clustered samples. Simulation and survey data were analysed with a multilevel partial credit testlet model. This model was compared with three alternative models – a testlet partial credit model (PCM), multilevel PCM, and PCM – in terms of model parameter estimation. The results indicated that the deviance information criterion was the fit index that always correctly identified the true multilevel testlet model based on the quantified evidence in model selection, while the Akaike and Bayesian information criteria could not identify the true model. In general, the estimation model and the magnitude of item and person clustering impacted the estimation accuracy of ability parameters, while only the estimation model and the magnitude of item clustering affected the item parameter estimation accuracy. Furthermore, ignoring item clustering effects produced higher total errors in item parameter estimates but did not have much impact on the accuracy of ability parameter estimates, while ignoring person clustering effects yielded higher total errors in ability parameter estimates but did not have much effect on the accuracy of item parameter estimates. When both clustering effects were ignored in the PCM, item and ability parameter estimation accuracy was reduced.  相似文献   

3.
当观测指标变量为二分分类数据时,传统的因素分析方法不再适用。作者简要回顾了SEM框架下的分类数据因素分析模型和IRT框架下的测验题目和潜在能力的关系模型,并对两种框架下主要采用的参数估计方法进行了总结。通过两个模拟研究,比较了SEM框架下GLSc和MGLSc估计方法与IRT框架下MML/EM估计方法的差异。研究结果表明:(1)三种方法中,GLSc得到参数估计的偏差最大,MGLSc和MML/EM估计方法相差不大;(2)随着样本量增大,各种项目参数估计的精度均提高;(3)项目因素载荷和难度估计的精度受测验长度的影响;(4)项目因素载荷和区分度估计的精度受总体因素载荷(区分度)高低的影响;(5)测验项目中阈值的分布会影响参数估计的精度,其中受影响最大的是项目区分度。(6)总体来看,SEM框架下的项目参数估计精度较IRT框架下项目参数估计的精度高。此外,文章还将两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

4.
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The objective was to provide bounds of the likely DIF effects on these measurement consequences. Five factors were manipulated: test length, percentage of DIF items per form, item type, sample size, and level of group ability difference. Results indicate that the greatest DIF effect was less than 2 points on the 0 to 60 total score scale and about 0.15 on the IRT ability scale. DIF had a limited effect on the ratio of true-score variance to observed-score variance, but its influence on the standard error of estimation for the IRT ability parameter was evident for certain ability values.  相似文献   

5.
设计项目参数、被试得分已知的测验情境,在两、三、四参数Logistic加权模型下进行能力估计,发现被试得分等级之间的能力步长存在着均匀的步长间距,被试得分能较好的反映多级记分的分数加权作用。两参数Logistic加权模型下会出现被试参数估计扰动现象,猜测现象会导致能力高估现象,失误现象会导致能力低估现象;三参数Logistic加权模型c型下能力高估现象未出现或不明显;三参数Logistic加权模型γ型下能力低估现象未出现或不明显;四参数Logistic加权模型下被试能力高估现象和低估现象都未出现或不明显,四参数Logistic加权模型是被试能力稳健性估计较好的方法。  相似文献   

6.
In this paper, we study the identification of a particular case of the 3PL model, namely when the discrimination parameters are all constant and equal to 1. We term this model, 1PL-G model. The identification analysis is performed under three different specifications. The first specification considers the abilities as unknown parameters. It is proved that the item parameters and the abilities are identified if a difficulty parameter and a guessing parameter are fixed at zero. The second specification assumes that the abilities are mutually independent and identically distributed according to a distribution known up to the scale parameter. It is shown that the item parameters and the scale parameter are identified if a guessing parameter is fixed at zero. The third specification corresponds to a semi-parametric 1PL-G model, where the distribution G generating the abilities is a parameter of interest. It is not only shown that, after fixing a difficulty parameter and a guessing parameter at zero, the item parameters are identified, but also that under those restrictions the distribution G is not identified. It is finally shown that, after introducing two identification restrictions, either on the distribution G or on the item parameters, the distribution G and the item parameters are identified provided an infinite quantity of items is available.  相似文献   

7.
杨向东 《心理学报》2010,42(7):802-812
自动化项目生成(Automatic Item Generation)中的项目参数是基于认知项目设计的刺激特征集预测的, 在不确定性来源上较之用经验数据标定的参数更为复杂。文章通过实证研究分析了在计算机适应性测验条件下基于认知设计系统法生成的抽象推理测验(ART)项目预测参数对能力参数估计的精确性。研究表明, 项目预测参数比相应标定参数分布更为趋中。这种回归效应既影响到能力参数估计误差大小, 也导致适应性测验过程中项目选择的差异。在控制了项目选择差异之后, 能力参数估计误差较之基于项目标定参数的能力估计误差大, 但差别并不明显。两者相应的能力估计值相关很高, 对应能力值之间的差异很小, 且几乎贯彻整个能力分布区间。  相似文献   

8.
应用项目反应理论对瑞文测验联合型的分析   总被引:1,自引:0,他引:1  
使用BILOG-MG3.0软件,边际极大似然估计,3参数Logistic模型对354名不同能力水平的男性青年的瑞文测验联合型数据进行了分析。结果显示:大多数瑞文测验联合型的题目都适合3参数Logistic模型(有6道题不适合)。整个测验的信息函数峰值的位置在难度量表的-3到-2之间,其值为16.82。共有18道题的信息函数峰值在0.2以下。从区分度来看,72道题目的区分度均大于0.5,比较理想。难度参数显示所有题目均较低,绝大部分都在0以下,最高的只有1.01。题目的难度主要由所需的操作水平决定。伪猜测参数在0.07-0.24之间。综合分析表明瑞文测验联合型对正常青年的智力评价精度较差。  相似文献   

9.
针对双目标CD-CAT,将六种项目区分度(鉴别力D、一般区分度GDI、优势比OR、2PL的区分度a、属性区分度ADI、认知诊断区分度CDI)分别与IPA方法结合,得到新的选题策略。模拟研究比较了它们的表现,还考察了区分度分层在控制项目曝光的表现。结果发现:新方法都能明显提高知识状态的判准率和能力估计精度;分层选题均能很好地提高题库利用率。总体上,OR加权能显著提高测量精度;OR分层选题在保证测量精度条件下显著提高项目曝光均匀性。  相似文献   

10.
Generating items during testing: Psychometric issues and models   总被引:2,自引:0,他引:2  
On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.This article is based on the Presidential Address Susan E. Embretson gave on June 26, 1999 at the 1999 Annual Meeting of the Psychometric Society held at the University of Kansas in Lawrence, Kansas. —Editor  相似文献   

11.
本研究以4岁~5岁儿童认知能力测验为例,在IRT框架下探讨了如何进行追踪数据的测量不变性分析。分析模型采用项目间多维项目反应理论模型(between-item MIRT model)和项目内(within-item MIRT model)多维two-tier model,被试为来自全国的882名48个月的儿童,工具为自编4岁~5岁儿童认知能力测验。经测验水平 分析和项目水平分析,结果表明:(1)本文对追踪数据的测量不变性分析方法合理有效; (2)该测验在两个时间点上满足部分测量不变性要求,测验的潜在结构稳定; (3)“方位题”的区分度和难度参数都发生变化,另有4题难度参数出现浮动; (4)儿童在4岁~5岁期间认知能力总体呈快速发展趋势,能力增长显著。  相似文献   

12.
针对测验中高能力被试答错容易试题的睡眠现象,可使用四参数Logistic模型分析数据。研究选取了来自心理测验和成就测验的实际数据,分别采用传统模型和四参数Logistic模型进行拟合,对不同模型的拟合指标及参数估计结果进行比较。结果表明,四参数Logistic模型能够提高拟合程度,增强估计结果的准确性,有效纠正高能力被试能力被低估的现象。建议在必要时使用四参数Logistic模型进行数据分析。  相似文献   

13.
刘红云  骆方  王玥  张玉 《心理学报》2012,44(1):121-132
作者简要回顾了SEM框架下分类数据因素分析(CCFA)模型和MIRT框架下测验题目和潜在能力的关系模型, 对两种框架下的主要参数估计方法进行了总结。通过模拟研究, 比较了SEM框架下WLSc和WLSMV估计方法与MIRT框架下MLR和MCMC估计方法的差异。研究结果表明:(1) WLSc得到参数估计的偏差最大, 且存在参数收敛的问题; (2)随着样本量增大, 各种项目参数估计的精度均提高, WLSMV方法与MLR方法得到的参数估计精度差异很小, 大多数情况下不比MCMC方法差; (3)除WLSc方法外, 随着每个维度测验题目的增多参数估计的精度逐渐增高; (4)测验维度对区分度参数和难度参数的影响较大, 而测验维度对项目因素载荷和阈值的影响相对较小; (5)项目参数的估计精度受项目测量维度数的影响, 只测量一个维度的项目参数估计精度较高。另外文章还对两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

14.
CD-CAT是CDA同CAT的相结合的产物,适用于课堂教学,是教师补救教学、学生自我学习的重要工具。作为CD-CAT重要组成部分的初始阶段项目选取方法是影响测验判准率的重要因素。本文基于现有研究和CDA的项目区分度提出了四种新的初始阶段项目选取方法:CTTID法、CDI法、CTTIDR*法和CDIR*法。通过模拟研究发现,在定长的CD-CAT下,题库质量是HD-HV下,初始阶段结束时,CTTIDR*法的PCCR比现有的T阵法高了.2999,比PWKL高了.1707,其它题库下趋势相同。整个测验结束时CTTIDR*法的判准率仍然是最高的。在变长的CD-CAT下,最大后验概率大于.7、.8、.9下,CTTIDR*法的被试平均测验长度比T阵法分别缩短了2.6170、2.2347、1.7470道题。  相似文献   

15.
马洁  刘红云 《心理科学》2018,(6):1374-1381
本研究通过高中英语阅读测验实测数据,对比分析双参数逻辑斯蒂克模型 (2PL-IRT)和加入不同数量题组的双参数逻辑斯蒂克模型 (2PL-TRT), 探究题组数量对参数估计及模型拟合的影响。结果表明:(1) 2PL-IRT模型对能力介于-1.50到0.50的被试,能力参数估计偏差较大;(2)将题组效应大于0.50的题组作为局部独立题目纳入模型,会导致部分题目区分度参数的低估和大部分题目难度参数的高估;(3)题组效应越大,将其当作局部独立题目纳入模型估计项目参数的偏差越大。  相似文献   

16.
Information functions are used to find the optimum ability levels and maximum contributions to information for estimating item parameters in three commonly used logistic item response models. For the three and two parameter logistic models, examinees who contribute maximally to the estimation of item difficulty contribute little to the estimation of item discrimination. This suggests that in applications that depend heavily upon the veracity of individual item parameter estimates (e.g. adaptive testing or text construction), better item calibration results may be obtained (for fixed sample sizes) from examinee calibration samples in which ability is widely dispersed.This work was supported by Contract No. N00014-83-C-0457, project designation NR 150-520, from Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval Research and Educational Testing Service through the Program Research Planning Council. Reproduction in whole or in part is permitted for any purpose of the United States Government. The author wishes to acknowledge the invaluable assistance of Maxine B. Kingston in carrying out this study, and to thank Charles Lewis for his many insightful comments on earlier drafts of this paper.  相似文献   

17.
多级评分题计算机自适应测验选题策略比较   总被引:12,自引:2,他引:10  
研究比较了多级评分题计算机化自适应测验五种选题策略的优劣。应用的IRT模型是Samejima的等级反应模型。参加比较的选题策略有难度均值与能力匹配法、难度中值与能力匹配法、信息量最大法和两种A分层法。比较指标采用了能力估计值返回真值偏差、能力估计标准差、人均用题数和试题调用次数标准差四个。研究采用蒙特卡罗模拟法,结果显示每种方法各有优劣,在分层得当情况下,A分层法(中)的综合效果最佳  相似文献   

18.
Count data naturally arise in several areas of cognitive ability testing, such as processing speed, memory, verbal fluency, and divergent thinking. Contemporary count data item response theory models, however, are not flexible enough, especially to account for over- and underdispersion at the same time. For example, the Rasch Poisson counts model (RPCM) assumes equidispersion (conditional mean and variance coincide) which is often violated in empirical data. This work introduces the Conway–Maxwell–Poisson counts model (CMPCM) that can handle underdispersion (variance lower than the mean), equidispersion, and overdispersion (variance larger than the mean) in general and specifically at the item level. A simulation study revealed satisfactory parameter recovery at moderate sample sizes and mostly unbiased standard errors for the proposed estimation approach. In addition, plausible empirical reliability estimates resulted, while those based on the RPCM were biased downwards (underdispersion) and biased upwards (overdispersion) when the simulation model deviated from equidispersion. Finally, verbal fluency data were analysed and the CMPCM with item-specific dispersion parameters fitted the data best. Dispersion parameter estimates indicated underdispersion for three out of four items. Overall, these findings indicate the feasibility and importance of the suggested flexible count data modelling approach.  相似文献   

19.
题组作为众多测验中的一种常见题型,由于项目间存在一定程度的依赖性而违背了局部独立性假设,若用项目反应模型进行参数估计将会出现较大的偏差.题组反应理论将被试与题组的交互作用纳入到模型中,解决了项目间相依性的问题.笔者对题组反应理论的发展、基本原理及其相关研究进行了综述,并将其应用在中学英语考试中.与项目反应理论相对比,结果发现:(1)题组反应模型与项目反应模型在各参数估计值的相关系数较强,尤其是能力参数和难度参数;(2)在置信区间宽度的比较上,题组反应模型在各个参数上均窄于项目反应模型,即题组反应模型的估计精度优于项目反应模型.  相似文献   

20.
计算机化自适应测验中原始题项目参数的估计   总被引:1,自引:1,他引:0  
计算机化自适应测验(Computerized Adaptive Testing, 简称CAT)其安全性面临着新的挑战, 小题库的安全更受威胁。如何建设一个大型、优质的题库成为CAT研究中一个非常重要的课题。目前CAT题库的建设存在一些问题, 如成本高且保密性较差。尤其是等值技术较复杂且锚题重复使用容易造成泄露。如能在实施CAT过程中插入未经过参数估计的项目(原始题), 同时对原始题项目参数进行估计, 这对建设大型、优质的CAT题库来说其意义是不言而喻的。本文基于1PLM和2PLM对此进行研究, 提出了原始题在线估计的新方法以及推导出了求区分度参数a迭代初值的计算公式。研究结果表明:无论是模拟研究还是实证研究, 原始题被作答的次数对项目参数估计结果都会产生不同的影响, 并且原始题作答人数越多项目参数估计精度也越高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号