期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Incorporating randomness in the Fisher information for improving item‐exposure control in CATs

Juan Ramón Barrada Julio Olea Vicente Ponsoda Francisco José Abad 《The British journal of mathematical and statistical psychology》2008,61(2):493-513

The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item‐exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods ( Revuelta & Ponsoda, 1998 ) and the proportional method ( Segall, 2004a ), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved. 相似文献

2.

Varying the valuating function and the presentable bank in computerized adaptive testing

Barrada JR Abad FJ Olea J 《The Spanish journal of psychology》2011,14(1):500-508

In computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test. In this study we expand the idea of combining the matching criterion with the Fisher information function. We also manipulate the number of strata into which the bank is divided. We find that the manipulation of the number of items administered with each function makes it possible to move from the pole of high accuracy and low security to the opposite pole. It is possible to greatly improve item bank security with much fewer losses in accuracy by selecting several items with the matching criterion. In general, it seems more appropriate not to stratify the bank. 相似文献

3.

多级评分计算机化自适应测验动态综合选题策略 总被引：1，自引：0，他引：1

罗芬丁树良王晓庆《心理学报》2012,44(3):400-412

多级评分可以提供更多关于被试的信息, 是计算机化自适应测验的一个发展方向, 选题策略是计算机化自适应测验的研究重点。对于多级评分的等级反应模型, 本文拟用区间估计的思想改进近期提出的几种选题策略, 并且将两级评分b-STR和a-STR推广到多级评分以改进最大信息量选题策略。Monte Carlo模拟实验表明在达到或接近原有选题策略测验精度的基础上, 本文提出的几种新选题策略有的能够有效降低测验长度, 有的可以极大降低项目曝光率。相似文献

4.

Maximum information stratification method for controlling item exposure in computerized adaptive testing

Barrada JR Mazuela P Olea J 《Psicothema》2006,18(1):156-159

The proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters ( a ) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter ( c ), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank. 相似文献

5.

CAT选题策略分类概述

简小珠戴海琦张敏强彭春妹《心理学探新》2014,34(5):446-451

选题是计算机化自适应测验（CAT）测试过程的关键环节,选题策略的目标是要达到较高的测量精度,同时也实现试题曝光率控制及其他测验目标的实现.本文根据选题策略的基本原理和衍生发展,将众多CAT选题策略分为五大选题策略系列：Fisher函数系列、K-LI函数系列、α分层系列、贝叶斯系列、b匹配系列;并根据测验目标（测验精度、试题曝光率控制、内容平衡、多条件约束）对这些选题策略进行了细分,并对CAT选题策略的选择思路进行归纳. 相似文献

6.

等级反应模型下计算机化自适应测验选题策略 总被引：7，自引：3，他引：4

陈平丁树良林海菁周婕《心理学报》2006,38(3):461-467

计算机化自适应测验(CAT)中的选题策略,一直是国内外相关学者关注的问题。然而对多级评分的CAT的选题策略的研究却很少报导。本研究采用计算机模拟程序对等级反应模型(Graded Response Model)下CAT的四种选题策略进行研究。研究表明：等级难度值与当前能力估计值匹配选题策略的综合评价最高;在选题策略中增设 “影子题库”可以明显提高项目调用的均匀性;并且不同的项目参数分布或不同的能力估计方法都对CAT评价指标有影响相似文献

7.

CD-CAT初始阶段项目选取方法

高椿雷罗照盛郑蝉金喻晓锋彭亚风郭小军《心理科学》2017,40(2):485-491

CD-CAT是CDA同CAT的相结合的产物,适用于课堂教学,是教师补救教学、学生自我学习的重要工具。作为CD-CAT重要组成部分的初始阶段项目选取方法是影响测验判准率的重要因素。本文基于现有研究和CDA的项目区分度提出了四种新的初始阶段项目选取方法:CTTID法、CDI法、CTTIDR*法和CDIR*法。通过模拟研究发现,在定长的CD-CAT下,题库质量是HD-HV下,初始阶段结束时,CTTIDR*法的PCCR比现有的T阵法高了.2999,比PWKL高了.1707,其它题库下趋势相同。整个测验结束时CTTIDR*法的判准率仍然是最高的。在变长的CD-CAT下,最大后验概率大于.7、.8、.9下,CTTIDR*法的被试平均测验长度比T阵法分别缩短了2.6170、2.2347、1.7470道题。相似文献

8.

自适应分组认知诊断测验设计及其选题策略

罗芬王晓庆丁树良熊建华《心理科学》2018,(3):720-726

应用OMST在线装配模式,提出自适应分组认知诊断测验（CD-AMGT）。由于知识状态的先决关系是偏序关系,而且构成格（lattice),利用知识状态当前估计值在格中的上下确界对被试真实知识状态的可能范围进行界定,由此装配下一分组,分组中结合PWKL策略或SHE策略进行选题以兼顾诊断精度、效率和安全性。模拟实验表明,CD-AMGT与PWKL、SHE对比,当题目类型丰富时,以分类准确率略微降低为代价,其题库使用均匀性和计算用时均表现出较大优势。相似文献

9.

计算机自适应测验中测验安全控制方法评述 总被引：1，自引：0，他引：1

李铭勇张敏强简小珠《心理科学进展》2010,18(8):1339-1348

计算机自适应测验在实际应用中曾经受到了测验安全问题质疑。国内外学者主要从两种研究思路提出了测验安全控制的方法：一是控制项目的最大曝光率, 沿着这个思路发展出来的方法有SH法、项目合格方法、多重最大曝光率法等; 二是改进选题策略, 沿着这个思路发展的方法主要是分层法及其变式。此外, 近年来出现了测验安全控制方法之间相结合的研究思路。本文从均方误差、项目曝光率、题库利用率等指标论述了测验安全控制方法的优缺点, 并概述了这些测验安全控制方法的研究发展历程与发展思路, 展望了今后的研究趋势。相似文献

10.

结合题目作答时间的计算机化自适应测验选题方法

郭治辰汪大勋蔡艳涂冬波《心理科学》2021,(5):1241-1248

计算机形式的测验能够记录考生在测验中的题目作答时间（Response Time, RT），作为一种重要的辅助信息来源，RT对于测验开发和管理具有重要的价值，特别是在计算机化自适应测验（Computerized Adaptive Testing, CAT）领域。本文简要介绍了RT在CAT选题方面应用并作以简评，分析了这些技术在实践中的可行性。最后，探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。相似文献

11.

应征公民计算机自适应化拼图测验的编制 总被引：1，自引：0，他引：1

田建全苗丹民杨业兵何宁肖玮《心理学报》2009,41(2):167-174

在文献回顾和参考外军有关资料的基础上,根据项目反应理论和空间能力测验的有关理论编制试题库。首先采用纸笔测验的形式进行预实验,探讨采用IRT理论编制CAT拼图测验的可行性。然后,在预实验的基础上对试题进行修订并扩充试题数量,编制计算机辅助测验。选择三参数Logistic模型,采用铆题等值设计,分7份不同的试卷在全国征兵心理检测的过程中对55777名应征公民进行施测。根据测试结果,对题目进行分析,选择高质量的题目构成CAT试题库,采用a系数分层抽样的方法控制曝光率,并采用不同的测验终止策略编制CAT拼图测验。最后用WAIS智力测验积木分测验和三门功课的考试成绩为效标,通过72名被试对CAT拼图测验进行效度验证。结果显示该测验符合项目反应理论三参数Logistic模型的假设,各题目参数比较理想,所编制的测验具有较好的信度和效度,可用于应征公民心理选拔的实践相似文献

12.

Computerized adaptive testing under nonparametric IRT models 总被引：1，自引：0，他引：1

Xueli Xu Jeff Douglas 《Psychometrika》2006,71(1):121-137

Nonparametric item response models have been developed as alternatives to the relatively inflexible parametric item response models. An open question is whether it is possible and practical to administer computerized adaptive testing with nonparametric models. This paper explores the possibility of computerized adaptive testing when using nonparametric item response models. A central issue is that the derivatives of item characteristic Curves may not be estimated well, which eliminates the availability of the standard maximum Fisher information criterion. As alternatives, procedures based on Shannon entropy and Kullback–Leibler information are proposed. For a long test, these procedures, which do not require the derivatives of the item characteristic eurves, become equivalent to the maximum Fisher information criterion. A simulation study is conducted to study the behavior of these two procedures, compared with random item selection. The study shows that the procedures based on Shannon entropy and Kullback–Leibler information perform similarly in terms of root mean square error, and perform much better than random item selection. The study also shows that item exposure rates need to be addressed for these methods to be practical. The authors would like to thank Hua Chang for his help in conducting this research. 相似文献

13.

引入曝光因子的计算机化自适应测验选题策略

程小扬丁树良严深海朱隆尹《心理学报》2011,43(2):203-212

在计算机化自适应测验(CAT)的研究中, 制定既高效又安全的选题策略是一个追求目标。用极大项目信息量准则(MIC)选题使得测验效率高、能力估计准确, 缺点是项目调用很不均匀, 影响考试的安全; 按a分层法通过控制试题曝光率以提高考试的安全性, 但该方法可能会使测验效率略有下降, 且该方法在各层内部无法实现对区分度的调整。本文针对上述两种选题策略的优缺点, 对0-1评分下的CAT, 通过引入曝光因子、分阶段自动调整区分度的影响以及提高选题准确性等手段, 对MIC和a-STR进行改进, 引入了两类新的选题策略。计算机模拟实验显示, 新的选题方法效果比较理想。相似文献

14.

考虑题目选项信息的非参数认知诊断计算机自适应测验

孙小坚郭磊《心理学报》2022,54(9):1137-1150

选择题中的作答选项能提供额外诊断信息, 为充分利用选项信息, 研究提出认知诊断计算机自适应测验(CD-CAT)中两种处理选择题选项信息的非参数选题策略和变长终止规则。模拟研究的结果发现：(1)定长条件下两种非参数选题策略的分类准确性整体要高于参数选题策略; (2)两种非参数选题策略较参数选题策略具有更加均衡的题库使用情况; (3)非参数选题策略在两种新的变长终止规则下具有更高的分类准确率; (4)两种非参数选题策略均适用于选择题CD-CAT情境, 使用者可任选其一进行测验分析。相似文献

15.

计算机化自适应测验中原始题项目参数的估计 总被引：1，自引：1，他引：0

游晓锋丁树良刘红云《心理学报》2010,42(7):813-820

计算机化自适应测验(Computerized Adaptive Testing, 简称CAT)其安全性面临着新的挑战, 小题库的安全更受威胁。如何建设一个大型、优质的题库成为CAT研究中一个非常重要的课题。目前CAT题库的建设存在一些问题, 如成本高且保密性较差。尤其是等值技术较复杂且锚题重复使用容易造成泄露。如能在实施CAT过程中插入未经过参数估计的项目(原始题), 同时对原始题项目参数进行估计, 这对建设大型、优质的CAT题库来说其意义是不言而喻的。本文基于1PLM和2PLM对此进行研究, 提出了原始题在线估计的新方法以及推导出了求区分度参数a迭代初值的计算公式。研究结果表明：无论是模拟研究还是实证研究, 原始题被作答的次数对项目参数估计结果都会产生不同的影响, 并且原始题作答人数越多项目参数估计精度也越高。相似文献

16.

Item selection methods with exposure and time control for computerized classification test

Yingshi Huang He Ren Ping Chen 《The British journal of mathematical and statistical psychology》2023,76(1):52-68

Computerized classification testing (CCT) commonly chooses items maximizing information at the cut score, which yields the most information for decision-making. However, a corollary problem is that all examinees will be given the same set of items, resulting in high test overlap rate and unbalanced item bank usage, which threatens test security. Moreover, another pivotal issue for CCT is time control. Since both the extremely long response time (RT) and large RT variability across examinees intensify time-induced anxiety, it is crucial to reduce the number of examinees exceeding the time limitation and the differences between examinees' test-taking times. To satisfy these practical needs, this paper proposes the novel idea of stage adaptiveness to tailor the item selection process to the decision-making requirement in each step and generate fresh insight into the existing response time selection method. Results indicate that a balanced item usage as well as short and stable test times across examinees can be achieved via the new methods. 相似文献

17.

基于基尼指数的双目标CD-CAT选题策略

罗芬王晓庆蔡艳涂冬波《心理学报》2020,52(12):1452-1465

双目标CD-CAT的测验结果既可用于形成性评估也可用于终结性评估。基尼指数可度量随机变量的不确定性程度, 值越小则随机变量的不确定程度越低。本文用基尼指数度量被试知识状态类别以及能力估计置信区间后验概率的变化, 提出基于基尼指数的选题策略。Monte Carlo实验表明与已有的选题策略相比, 新策略的知识状态分类精度和能力估计精度都较高, 同时能有效兼顾题库利用均匀性, 并能快速实时响应, 且受认知诊断模型和被试知识状态分布的影响较小, 可用于实际测验中含多种认知诊断模型的混合题库。相似文献

18.

基于Gini指数的认知诊断计算机化自适应选题策略

罗芬王晓庆蔡艳涂冬波《心理科学》2021,(2):440-448

Gini指数可用来描述分布的不均匀性,已广泛应用于决策树算法,本文开发了基于Gini指数的认知诊断计算机化自适应选题策略,并在饱和模型和缩减模型下与SHE, MPWKL,GDI,PWKL选题策略进行比较。模拟研究表明,基于Gini指数的选题策略与SHE,MPWKL,GDI相比,分类精度相近并提高了题库的利用率;与PWKL相比,提高了分类的精度和选题速度,综合来看,基于Gini指数的选题策略能够兼顾分类精度和题库使用均匀性。相似文献

19.

多级评分的认知诊断计算机化适应测验

蔡艳苗莹涂冬波《心理学报》2016,48(10):1338-1346

本文在0-1评分的CD-CAT基础上, 拓展出了适合多级评分CD-CAT (psCD-CAT)的认知诊断模型及选题策略, 为实现多级评分CD-CAT提供了方法支持。Monte Carlo模拟实验结果表明：本文拓展的多级评分CD-CAT具有较理想的属性诊断正确率及测验效率和题库安全性, 可以用于多级评分数据的CD-CAT; 模拟实验还表明, 整体来看PS-PWKL和PS-HKL两种选题策略具有较高属性判准率、题库安全性和高测验效率, 且均优于PS-KL选题策略。总之, 本研究对于进一步拓展CD-CAT在实践中的应用提供了认知诊断模型与选题策略等。相似文献

20.

Item Selection in Multidimensional Computerized Adaptive Testing—Gaining Information from Different Angles

Chun Wang Hua-Hua Chang 《Psychometrika》2011,76(3):363-384

Over the past thirty years, obtaining diagnostic information from examinees’ item responses has become an increasingly important feature of educational and psychological testing. The objective can be achieved by sequentially selecting multidimensional items to fit the class of latent traits being assessed, and therefore Multidimensional Computerized Adaptive Testing (MCAT) is one reasonable approach to such task. This study conducts a rigorous investigation on the relationships among four promising item selection methods: D-optimality, KL information index, continuous entropy, and mutual information. Some theoretical connections among the methods are demonstrated to show how information about the unknown vector θ can be gained from different perspectives. Two simulation studies were carried out to compare the performance of the four methods. The simulation results showed that mutual information not only improved the overall estimation accuracy but also yielded the smallest conditional mean squared error in most region of θ. In the end, the overlap rates were calculated to empirically show the similarity and difference among the four methods. 相似文献