首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
计算机形式的测验能够记录考生在测验中的题目作答时间(Response Time, RT),作为一种重要的辅助信息来源,RT对于测验开发和管理具有重要的价值,特别是在计算机化自适应测验(Computerized Adaptive Testing, CAT)领域。本文简要介绍了RT在CAT选题方面应用并作以简评,分析了这些技术在实践中的可行性。最后,探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。  相似文献   

2.
本研究开发了两种新的适用于多级评分项目的多维计算机化自适应测验(PMCAT)的选题策略——修正的连续熵(RCEM)和修正的后验期望KL信息(MKB)方法,并与以往PMCAT的选题策略进行了对比研究。Monte Carlo实验结果表明:两种新开发的选题策略比原方法估计精度更高,并且RCEM方法在所有选题策略中曝光率最低。新开发的选题策略具有较理想的估计精度和曝光控制效果,为PMCAT在实践中的应用提供了新的方法支持。  相似文献   

3.
计算机化自适应测验选题策略述评   总被引:2,自引:0,他引:2  
毛秀珍  辛涛 《心理科学进展》2011,19(10):1552-1562
计算机化自适应测验(computerized adaptive testing, CAT)是基于测量理论和计算机技术的一种测验模式。它根据考生的作答反应自适应地选择测验项目。选题策略是CAT的重要组成部分之一, 关系到测量效率、测验安全和测验信、效度等重要问题。根据CAT是否具有非统计约束对传统CAT和认知诊断CAT的选题策略进行了分类介绍, 未来研究应进一步提高选题策略的综合表现、深入探讨多级评分项目和认知诊断CAT的选题策略。  相似文献   

4.
等级反应模型下计算机化自适应测验选题策略   总被引:4,自引:3,他引:4  
陈平  丁树良  林海菁  周婕 《心理学报》2006,38(3):461-467
计算机化自适应测验(CAT)中的选题策略,一直是国内外相关学者关注的问题。然而对多级评分的CAT的选题策略的研究却很少报导。本研究采用计算机模拟程序对等级反应模型(Graded Response Model)下CAT的四种选题策略进行研究。研究表明:等级难度值与当前能力估计值匹配选题策略的综合评价最高;在选题策略中增设 “影子题库”可以明显提高项目调用的均匀性;并且不同的项目参数分布或不同的能力估计方法都对CAT评价指标有影响  相似文献   

5.
提出了两种适用于定长CD-CAT的题目曝光控制方法(HIRP、HIRT),这些方法在保证较高分类准确率的同时还有较合理的题目曝光率,新方法由二分化方法和RP及RT方法进行结合并适当调整而得到。模拟研究比较了其与RP、RT、SM、SMIE、RHA和SDBS的表现,结果表明: (1)HIRP的分类准确率和题目曝光率均好于SM、SMIE和SDBS;(2)HIRT的题目曝光率较RP、SM、SMIE、RHA和SDBS稍差,但分类准确率更高;(3)HIRP的分类准确率低于RT和RP,但题目曝光控制要更好。  相似文献   

6.
陈平  辛涛 《心理学报》2011,43(7):836-850
项目的增补对认知诊断计算机化自适应测验(CD-CAT)题库的开发与维护至关重要。借鉴单维项目反应理论(IRT)中联合极大似然估计方法(JMLE)的思路, 提出联合估计算法(JEA), 仅依赖被试在旧题和新题上的作答反应联合地、自动地估计新题的属性向量和新题的项目参数。研究结果表明:当项目参数相对较小且样本量相对较大时, JEA算法在新题属性向量和新题项目参数估计精度方面表现不错; 而且样本大小、项目参数大小以及项目参数初值都影响着JEA算法的表现。  相似文献   

7.
An important design feature in the implementation of both computerized adaptive testing and multistage adaptive testing is the use of an appropriate method for item selection. The item selection method is expected to select the most optimal items depending on the examinees’ ability level while considering other design features (e.g., item exposure and item bank utilization). This study introduced collaborative filtering (CF) as a new method for item selection in the on-the-fly assembled multistage adaptive testing framework. The user-based CF (UBCF) and item-based CF (IBCF) methods were compared to the maximum Fisher information method based on the accuracy of ability estimation, item exposure rates, and item bank utilization under different test conditions (e.g., item bank size, test length, and the sparseness of training data). The simulation results indicated that the UBCF method outperformed the traditional item selection methods regarding measurement accuracy. Also, the IBCF method showed the most superior performance in terms of item bank utilization. Limitations of the current study and the directions for future research are discussed.  相似文献   

8.
毛秀珍  辛涛 《心理学报》2014,46(12):1910-1922
项目曝光控制和内容约束关系到测验安全、测验的信度和效度, 是计算机化自适应测验(Computerized Adaptive Testing, CAT)中两类重要的非统计约束条件。本文在认知诊断CAT中针对内容约束和项目曝光控制要求, 运用5种方法选择测验项目。它们分别是:(1) Monte Carlo方法与项目合格方法相结合, 记为MC-IE; (2) Monte Carlo方法与最大优先指标方法相结合, 记为MC-MPI; (3) Monte Carlo方法与限制阈值方法相结合, 记为MC-RT; (4) Monte Carlo方法与限制进度指标方法相结合, 记为MC-RPG以及(5) Monte Carlo方法与最大后验概率方法相结合, 记为MC-PP。然后通过在线性、收敛、发散、无结构和独立五种属性结构下构建题库并运用重参化融融统和模型模拟被试反应比较它们的选题表现。研究发现, (1) 相同选题方法在不同属性结构下项目曝光率的分布类似, 测量精度按线性、收敛、发散、无结构和独立结构的顺序依次降低; (2) 相同属性结构下, 不同方法的测量精度高低依次为MC-PP、MC-IE、MC-RT、MC-MPI和MC-RPG方法; 项目曝光均匀性优劣依次为MC-RPG、MC-MPI、MC-RT、MC-IE和MC-PP方法。统一量纲值表明, MC-RPG方法的综合表现最好, MC-MPI方法的表现次之。  相似文献   

9.
毛秀珍  辛涛 《心理学报》2013,45(6):694-703
项目曝光率关系到题库建设和测验安全,是计算机化自适应测验(Computerized Adaptive Testing, CAT)需要考虑的重要问题。在认知诊断 CAT 情形下,首先基于传统 CAT 中 a-分层方法的思想提出按项目信息量对题库分层的分层多阶段(Stratified Multistage, SM)选题方法;然后将 SM 方法与项目合格(Item Eligibility, IE)方法相结合得到SMIE方法。在此基础上,开展模拟研究比较SM、IE、SMIE、最大修正优先指标(Maximum Modified Priority Index, MMPI)方法、限制阈值(Restrictive Threshold, RT)方法和限制进度(Restrictive Progressive, RPG)方法的选题表现。总体上,它们的测量精度从高到低依次为IE、SM、SMIE、RT、RPG和MMPI方法;项目曝光分布均匀性的优劣次序为MMPI、RPG、SMIE、RT、SM和IE方法;SMIE和RT方法能较好地平衡测量精度和项目曝光均匀性要求。  相似文献   

10.
计算机化自适应诊断测验中原始题的属性标定   总被引:2,自引:0,他引:2  
认知诊断测验项目开发成本较高, 要标定大量项目的属性相当费时费力, 专家完成这一任务也比较困难。对于在计算机化自适应诊断测验中的项目属性的标定尚未见到报导。在已有的为诊断测验开发的小型题库基础上, 本文在计算机化自适应认知诊断测验过程中, 植入原始题, 对项目属性标定的问题进行探讨, 重点研究原始题属性标定的方法及其影响因素, 除了MMLE方法和MLE方法外, 还建立了一种新的可用于所有非补偿认知诊断模型的属性标定的方法—— 交差方法。Monte Carlo模拟结果显示, MMLE方法较MLE方法好; 在知识状态估计精度较高时, 自适应植入原始题较随机植入原始题有一定的优势; 随着知识状态估计精度提高和原始题作答次数增加, 交差方法与MLE方法基本相当, 只是在发散型和无结构型表现欠佳, 但是交差方法不需要预先设定项目参数值。  相似文献   

11.
尽管多阶段测验(MST)在保持自适应测验优点的同时允许测验编制者按照一定的约束条件去建构每一个模块和题板,但建构测验时若因忽视某些潜在的因素而导致题目之间出现局部题目依赖性(LID)时,也会对MST测验结果带来一定的危害。为探究"LID对MST的危害"这一问题,本研究首先介绍了MST和LID等相关概念;然后通过模拟研究比较探讨该问题,结果表明LID的存在会影响被试能力估计的精度但仍为估计偏差较小,且该危害不限于某一特定的路由规则;之后为消除该危害,使用了题组反应模型作为MST施测过程中的分析模型,结果表明尽管该方法能够消除部分危害但效果有限。这一方面表明LID对MST中被试能力估计精度所带来的危害确实值得关注,另一方面也表明在今后关于如何消除MST中由LID造成危害的方法仍值得进一步探究的。  相似文献   

12.
毛秀珍  刘欢  唐倩 《心理科学》2019,(1):187-193
双因子模型假设测验考察一个一般因子和多个组因子,符合很多教育和心理测验的因素结构。“维度缩减”方法将参数估计中多维积分计算化简为多个迭代二维积分,是双因子模型的重要特征。本文针对考察多级评分项目的计算机化自适应测验,首先推导双因子等级反应模型下Fisher信息量的计算,然后推导“维度缩减”方法在项目选择方法中的应用,最后在低、中、高双因子模式题库中比较D-优化方法、后验加权Fisher信息D优化方法(PDO)、后验加权Kullback-Leibler方法(PKL)、连续熵(CEM)和互信息(MI)方法在能力估计的相关、均方根误差、绝对值偏差和欧氏距离的表现。模拟研究表明:(1)双因子模式越强,即一般因子和组因子在项目上的区分度的差异越小,一般因子估计精度降低,组因子估计精度增加,整体能力的估计精度提高;(2)相同实验条件下,连续熵方法的测量精度最高,PKL方法的能力估计精度最低,其它方法的测量精度没有显著差异。  相似文献   

13.
郭磊  王卓然  王丰  边玉芳 《心理学报》2014,46(5):702-713
测验安全和题库使用率在计算机化自适应测验中十分重要, 特别是高风险测验。传统的SHGT法兼具同时控制项目曝光率和广义测验重叠率的功能, 但题库使用率较差。a分层法能够提高题库使用率, 但对过度曝光的项目控制不足。本研究将a分层法的思想与SHGT法相结合, 各取所长, 提出了3种新的选题方法:SHGT_a法, SHGT_b法和SHGT_c法。研究结果表明:(1)与SHGT法相比, 新方法均可以在有效地控制项目曝光率和广义测验重叠率同时, 极大地提高题库使用率; (2)随着预设项目曝光率(rmax)和广义测验重叠率( )取值的增大以及共享人数a的减小, 新方法对被试能力估计的精度呈上升趋势。比起SHGT法, 新方法仍能保持很高的题库使用率; (3)当区分度和难度的相关(rab)较大时, SHGT_b和SHGT_c法在能力估计精度方面优于SHGT_a法; (4)在不同的测验考察内容比例下, 3种新方法对被试能力估计的精度均较好; (5)与SHGT法相比, 新方法能够有效地控制项目曝光率过度控制的问题。  相似文献   

14.
陈平  辛涛 《心理学报》2011,43(6):710-724
项目增补对认知诊断计算机化自适应测验(CD-CAT)中的题库维护至关重要。在传统CAT中, 在线标定方法经常用于估计新题的项目参数。然而直到现在, 在CD-CAT领域还没有任何关于在线标定的论文公开发表。为将传统CAT中3种有代表性的在线标定方法(Method A、OEM和 MEM)推广至CD-CAT (CD-Method A、CD-OEM和CD-MEM)建立分析基础, 并采用模拟方法对这3种方法进行比较。研究表明:CD-Method A方法在项目参数的返真性方面优于其它两种方法; 自适应标定设计较随机标定设计可以提高项目参数的返真质量。  相似文献   

15.
Abstract

Differential item functioning (DIF) is a pernicious statistical issue that can mask true group differences on a target latent construct. A considerable amount of research has focused on evaluating methods for testing DIF, such as using likelihood ratio tests in item response theory (IRT). Most of this research has focused on the asymptotic properties of DIF testing, in part because many latent variable methods require large samples to obtain stable parameter estimates. Much less research has evaluated these methods in small sample sizes despite the fact that many social and behavioral scientists frequently encounter small samples in practice. In this article, we examine the extent to which model complexity—the number of model parameters estimated simultaneously—affects the recovery of DIF in small samples. We compare three models that vary in complexity: logistic regression with sum scores, the 1-parameter logistic IRT model, and the 2-parameter logistic IRT model. We expected that logistic regression with sum scores and the 1-parameter logistic IRT model would more accurately estimate DIF because these models yielded more stable estimates despite being misspecified. Indeed, a simulation study and empirical example of adolescent substance use show that, even when data are generated from / assumed to be a 2-parameter logistic IRT, using parsimonious models in small samples leads to more powerful tests of DIF while adequately controlling for Type I error. We also provide evidence for minimum sample sizes needed to detect DIF, and we evaluate whether applying corrections for multiple testing is advisable. Finally, we provide recommendations for applied researchers who conduct DIF analyses in small samples.  相似文献   

16.
Interim assessment occurs throughout instruction to provide feedback about what students know and have achieved. Different from the current available cognitive diagnostic computerized adaptive testing (CD-CAT) design that focuses on assessment at a single time point, the authors discuss several designs of interim CD-CAT that are suitable in the learning context. The interim CD-CAT differs from the current available CD-CAT designs primarily because students’ mastery profile (i.e., skills mastery) changes due to learning, and new attributes are added periodically. Moreover, hierarchies exist among attributes taught sequentially and such information could be used during item selection. Two specific designs are considered: The first one is when new attributes are taught in Stage II, but the student mastery status of the previously taught attributes stays the same. The second design is when both new attributes are taught, and previously taught attributes can be further learned or forgotten in Stage II. For both designs, the authors propose an individual prior, which considers a person’s learning history and population learning model, to start an interim CD-CAT. Simulation results show that the Stage II CD-CAT using individual prior outperforms the methods using population priors. The GDINA (generalized deterministic inputs, noisy, “and” gate) diagnostic index (GDI) is extended to accommodate item hierarchies, and analytic results are provided to further illustrate the types of items that are most popular during item selection. As the first study that focuses on the application of CD-CAT in a learning context, the methods and results present herein showed the great promise of using CD-CAT to monitor learning.  相似文献   

17.
Computerized classification testing (CCT) commonly chooses items maximizing information at the cut score, which yields the most information for decision-making. However, a corollary problem is that all examinees will be given the same set of items, resulting in high test overlap rate and unbalanced item bank usage, which threatens test security. Moreover, another pivotal issue for CCT is time control. Since both the extremely long response time (RT) and large RT variability across examinees intensify time-induced anxiety, it is crucial to reduce the number of examinees exceeding the time limitation and the differences between examinees' test-taking times. To satisfy these practical needs, this paper proposes the novel idea of stage adaptiveness to tailor the item selection process to the decision-making requirement in each step and generate fresh insight into the existing response time selection method. Results indicate that a balanced item usage as well as short and stable test times across examinees can be achieved via the new methods.  相似文献   

18.
对从HSK题库中计算机自动生成试卷稳定性的试验检验   总被引:1,自引:0,他引:1  
由计算机从题库中自动生成的试卷能否保持难度的相对稳定?根据IRT进行的等值误差范围有多大?为了回答这些问题,本文以共同组等值作为标准,对基于IRT之上的共同题等值误差进行了试验检验。试验中,采取一定措施保证了考生的动机水平。结果显示,IRT等值的校正方向都是正确的。在4个分测验中有3个分测验的的等值校正效果较理想,1个分测验的等值校正效果不够理想。计算机自动生成的试卷与原有人工命制的试卷在得分方面比较一致,分数相关达到0.931,获得证书的情况也是比较一致的。  相似文献   

19.
郭磊  郑蝉金  边玉芳 《心理学报》2015,47(1):129-140
本研究借鉴传统计算机化自适应测验的思想, 并结合认知诊断的特点, 在认知诊断框架下提出了4种变长CD-CAT的终止规则, 分别是属性标准误法(SEA)、邻近后验概率之差法(DAPP)、二等分法(HA)以及混合法(HM)。在未控制曝光和采用不同曝光控制条件下, 与HSU法及KL法进行了比较。研究结果表明:(1) 终止条件越严格, 平均测验长度越长, 按测验长度最大值终止的测验百分比越大, 模式判准率越高。(2) 当未加入曝光控制时, 4种新的终止规则均有较好表现, 与HSU法十分接近。随着最大后验概率预设值的增加或e的减小, 模式判准率呈上升趋势, 平均测验长度逐渐增加, 但在题库使用率方面均较差。(3) 当加入项目曝光控制时, 6种变长终止规则下的题库使用率有了极大的提升, 仍能保持较高的模式判准率, 并且不同的曝光控制方法对终止规则的影响是不同的。其中, 相对标准终止规则极易受到曝光控制方法的影响。(4) 综合来看, SEA、HM以及HA法在各项指标上的表现与HSU法基本一致, 其次为KL法和DAPP法。  相似文献   

20.
Owen (1975) proposed an approximate empirical Bayes procedure for item selection in computerized adaptive testing (CAT). The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational complexity involved in a fully Bayesian approach but is no longer necessary given the computational power currently available for adaptive testing. This paper suggests several item selection criteria for adaptive testing which are all based on the use of the true posterior. Some of the statistical properties of the ability estimator produced by these criteria are discussed and empirically characterized.Portions of this paper were presented at the 60th annual meeting of the Psychometric Society, Minneapolis, Minnesota, June, 1995. The author is indebted to Wim M. M. Tielen for his computational support.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号