共查询到20条相似文献,搜索用时 78 毫秒
1.
Investigating the relationship between item exposure and test overlap: Item sharing and item pooling
Shu‐Ying Chen Pui‐Wa Lei 《The British journal of mathematical and statistical psychology》2010,63(1):205-226
To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed‐length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two. 相似文献
2.
项目曝光控制和内容约束关系到测验安全、测验的信度和效度, 是计算机化自适应测验(Computerized Adaptive Testing, CAT)中两类重要的非统计约束条件。本文在认知诊断CAT中针对内容约束和项目曝光控制要求, 运用5种方法选择测验项目。它们分别是:(1) Monte Carlo方法与项目合格方法相结合, 记为MC-IE; (2) Monte Carlo方法与最大优先指标方法相结合, 记为MC-MPI; (3) Monte Carlo方法与限制阈值方法相结合, 记为MC-RT; (4) Monte Carlo方法与限制进度指标方法相结合, 记为MC-RPG以及(5) Monte Carlo方法与最大后验概率方法相结合, 记为MC-PP。然后通过在线性、收敛、发散、无结构和独立五种属性结构下构建题库并运用重参化融融统和模型模拟被试反应比较它们的选题表现。研究发现, (1) 相同选题方法在不同属性结构下项目曝光率的分布类似, 测量精度按线性、收敛、发散、无结构和独立结构的顺序依次降低; (2) 相同属性结构下, 不同方法的测量精度高低依次为MC-PP、MC-IE、MC-RT、MC-MPI和MC-RPG方法; 项目曝光均匀性优劣依次为MC-RPG、MC-MPI、MC-RT、MC-IE和MC-PP方法。统一量纲值表明, MC-RPG方法的综合表现最好, MC-MPI方法的表现次之。 相似文献
3.
《The British journal of mathematical and statistical psychology》2005,58(2):239-257
In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (θ) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item‐overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non‐statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS‐MI), integrating the MS and MI approaches, in testing with non‐statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS–MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS–MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed. 相似文献
4.
CD–CAT中已有选题策略较注重测验效率,而对题库使用率不够重视。针对此问题,基于DINA模型,引入两种新的选题策略KLED和RHA,同时对HA进行模拟研究。结果显示:PWKL与KLED只在测验效率上具有优势;KLED若按属性向量分层,题库使用率有所提高,KLED比ED更容易推广到其他有显式表达的诊断模型场合;HA、RHA和RP–PWKL可较好兼顾测验效度和题库使用率,但RP-PWKL需设置项目的最大曝光率阈值。两种新选题方法在定长和变长CD-CAT都具有一定的应用价值。 相似文献
5.
项目曝光率关系到题库建设和测验安全,是计算机化自适应测验(Computerized Adaptive Testing, CAT)需要考虑的重要问题。在认知诊断 CAT 情形下,首先基于传统 CAT 中 a-分层方法的思想提出按项目信息量对题库分层的分层多阶段(Stratified Multistage, SM)选题方法;然后将 SM 方法与项目合格(Item Eligibility, IE)方法相结合得到SMIE方法。在此基础上,开展模拟研究比较SM、IE、SMIE、最大修正优先指标(Maximum Modified Priority Index, MMPI)方法、限制阈值(Restrictive Threshold, RT)方法和限制进度(Restrictive Progressive, RPG)方法的选题表现。总体上,它们的测量精度从高到低依次为IE、SM、SMIE、RT、RPG和MMPI方法;项目曝光分布均匀性的优劣次序为MMPI、RPG、SMIE、RT、SM和IE方法;SMIE和RT方法能较好地平衡测量精度和项目曝光均匀性要求。 相似文献
6.
7.
In this article, four item selection methods in computerized adaptive testing are examined in terms of classification accuracy and consistency, including two popular heuristics for constraint management, the maximum priority index (MPI) method and the weighted deviation modeling method, as well as the widely known maximum Fisher information method and randomized item selection as baselines. Results suggest that the MPI method is able to meet constraints and keep test overlap rate low. Among the four methods, it is the only one that manages to produce parallel forms in terms of content coverage and, consequently, the only method to which the idea of classification consistency applies. With tests as short as 12 items, the MPI method does fairly well in classifying examinees accurately and consistently. Its performance improves with longer tests. The effects of number of decision categories and cut score locations are also examined. Recommendations are made in the Discussion section. 相似文献
8.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。 相似文献
9.
10.
本研究借鉴传统计算机化自适应测验的思想, 并结合认知诊断的特点, 在认知诊断框架下提出了4种变长CD-CAT的终止规则, 分别是属性标准误法(SEA)、邻近后验概率之差法(DAPP)、二等分法(HA)以及混合法(HM)。在未控制曝光和采用不同曝光控制条件下, 与HSU法及KL法进行了比较。研究结果表明:(1) 终止条件越严格, 平均测验长度越长, 按测验长度最大值终止的测验百分比越大, 模式判准率越高。(2) 当未加入曝光控制时, 4种新的终止规则均有较好表现, 与HSU法十分接近。随着最大后验概率预设值的增加或e的减小, 模式判准率呈上升趋势, 平均测验长度逐渐增加, 但在题库使用率方面均较差。(3) 当加入项目曝光控制时, 6种变长终止规则下的题库使用率有了极大的提升, 仍能保持较高的模式判准率, 并且不同的曝光控制方法对终止规则的影响是不同的。其中, 相对标准终止规则极易受到曝光控制方法的影响。(4) 综合来看, SEA、HM以及HA法在各项指标上的表现与HSU法基本一致, 其次为KL法和DAPP法。 相似文献
11.
本研究开发了两种新的适用于多级评分项目的多维计算机化自适应测验(PMCAT)的选题策略——修正的连续熵(RCEM)和修正的后验期望KL信息(MKB)方法,并与以往PMCAT的选题策略进行了对比研究。Monte Carlo实验结果表明:两种新开发的选题策略比原方法估计精度更高,并且RCEM方法在所有选题策略中曝光率最低。新开发的选题策略具有较理想的估计精度和曝光控制效果,为PMCAT在实践中的应用提供了新的方法支持。 相似文献
12.
测验安全和题库使用率在计算机化自适应测验中十分重要, 特别是高风险测验。传统的SHGT法兼具同时控制项目曝光率和广义测验重叠率的功能, 但题库使用率较差。a分层法能够提高题库使用率, 但对过度曝光的项目控制不足。本研究将a分层法的思想与SHGT法相结合, 各取所长, 提出了3种新的选题方法:SHGT_a法, SHGT_b法和SHGT_c法。研究结果表明:(1)与SHGT法相比, 新方法均可以在有效地控制项目曝光率和广义测验重叠率同时, 极大地提高题库使用率; (2)随着预设项目曝光率(rmax)和广义测验重叠率( )取值的增大以及共享人数a的减小, 新方法对被试能力估计的精度呈上升趋势。比起SHGT法, 新方法仍能保持很高的题库使用率; (3)当区分度和难度的相关(rab)较大时, SHGT_b和SHGT_c法在能力估计精度方面优于SHGT_a法; (4)在不同的测验考察内容比例下, 3种新方法对被试能力估计的精度均较好; (5)与SHGT法相比, 新方法能够有效地控制项目曝光率过度控制的问题。 相似文献
13.
Zhang(2013)提出了序贯监测程序(SMP)用以检测CAT中的题目在作答过程中是否发生泄漏。然而,该方法会出现虚报且未关注在题目泄漏后,对能力估计精度产生的影响。本研究在SMP基础上引入个人拟合指标,提出SMP_PFI方法,拟在给定的置信度上核实被SMP标记的题目是否真正泄漏,并探查SMP_PFI方法对能力估计精度与被封存题目数量关系的影响。实验结果表明:新方法能够有效降低SMP单独运行时的一类错误。通过控制CPFI值能够平衡能力估计精度与被封存题目数量之间的关系。 相似文献
14.
Computerized adaptive testing under nonparametric IRT models 总被引:1,自引:0,他引:1
Nonparametric item response models have been developed as alternatives to the relatively inflexible parametric item response
models. An open question is whether it is possible and practical to administer computerized adaptive testing with nonparametric
models. This paper explores the possibility of computerized adaptive testing when using nonparametric item response models.
A central issue is that the derivatives of item characteristic Curves may not be estimated well, which eliminates the availability
of the standard maximum Fisher information criterion. As alternatives, procedures based on Shannon entropy and Kullback–Leibler
information are proposed. For a long test, these procedures, which do not require the derivatives of the item characteristic
eurves, become equivalent to the maximum Fisher information criterion. A simulation study is conducted to study the behavior
of these two procedures, compared with random item selection. The study shows that the procedures based on Shannon entropy
and Kullback–Leibler information perform similarly in terms of root mean square error, and perform much better than random
item selection. The study also shows that item exposure rates need to be addressed for these methods to be practical.
The authors would like to thank Hua Chang for his help in conducting this research. 相似文献
15.
提出了两种适用于定长CD-CAT的题目曝光控制方法(HIRP、HIRT),这些方法在保证较高分类准确率的同时还有较合理的题目曝光率,新方法由二分化方法和RP及RT方法进行结合并适当调整而得到。模拟研究比较了其与RP、RT、SM、SMIE、RHA和SDBS的表现,结果表明: (1)HIRP的分类准确率和题目曝光率均好于SM、SMIE和SDBS;(2)HIRT的题目曝光率较RP、SM、SMIE、RHA和SDBS稍差,但分类准确率更高;(3)HIRP的分类准确率低于RT和RP,但题目曝光控制要更好。 相似文献
16.
Methods of cognitive diagnostic computerized adaptive testing (CD-CAT) under higher-order cognitive diagnosis models have been developed to simultaneously provide estimates of the attribute mastery statuses of examinees for formative assessment and estimates of a latent continuous trait for overall summative evaluation. In a typical CD-CAT environment, examinees are often subject to a time limit, and the examinees’ response times (RTs) for specific test items can be routinely recorded by custom-made programs. Because examinees are individually administered tailored sets of test items from the item pool, they may experience different levels of speededness during testing and different levels of risk of running out of time. In this study, RTs were considered during the item-selection procedure to control the test speededness and the RTs were treated as useful information for improving latent trait estimation in CD-CAT under the higher-order deterministic input, noisy ‘and’ gate (DINA) model. A modified posterior-weighted Kullback–Leibler (PWKL) method that maximizes the item information per time unit and a shadow-test method that assembles a provisional test subject to a specified time constraint were developed. Two simulation studies were conducted to assess the effects of the proposed methods on the quality of CD-CAT for fixed- and variable-length exams. The results show that, compared with the traditional PWKL method, the proposed methods preserve a lower risk of running out of time while ensuring satisfactory attribute estimation and providing more accurate estimates of the latent trait and speed parameters. Finally, several suggestions for future research are proposed. 相似文献
17.
《The British journal of mathematical and statistical psychology》2003,56(2):359-378
Content balancing is often required in the development and implementation of computerized adaptive tests (CATs). In the current study, we propose a modified a‐stratified method, the a‐stratified method with content blocking. As a further refinement of a‐stratified CAT designs, the new method incorporates content specifications into item pool stratification. Simulation studies were conducted to compare the new method with three previous item selection methods: the a‐stratified method; the a‐stratified with b‐blocking method; and the maximum Fisher information method with Sympson‐Hetter exposure control. The results indicated that the refined a‐stratified design performed well in reducing item overexposure rates, balancing item usage within the pool, and maintaining measurement precision, in a situation where all four procedures were forced to balance content. 相似文献
18.
Juan Ramón Barrada Julio Olea Vicente Ponsoda Francisco José Abad 《The British journal of mathematical and statistical psychology》2008,61(2):493-513
The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item‐exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods ( Revuelta & Ponsoda, 1998 ) and the proportional method ( Segall, 2004a ), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved. 相似文献
19.
In this article, we propose a simplified version of the maximum information per time unit method (MIT; Fan, Wang, Chang, & Douglas, Journal of Educational and Behavioral Statistics 37: 655–670, 2012), or MIT-S, for computerized adaptive testing. Unlike the original MIT method, the proposed MIT-S method does not require fitting a response time model to the individual-level response time data. It is also computationally efficient. The performance of the MIT-S method was compared against that of the maximum information (MI) method in terms of measurement precision, testing time saving, and item pool usage under various item response theory (IRT) models. The results indicated that when the underlying IRT model is the two- or three-parameter logistic model, the MIT-S method maintains measurement precision and saves testing time. It performs similarly to the MI method in exposure control; both result in highly skewed item exposure distributions, due to heavy reliance on the highly discriminating items. If the underlying model is the one-parameter logistic (1PL) model, the MIT-S method maintains the measurement precision and saves a considerable amount of testing time. However, its heavy reliance on time-saving items leads to a highly skewed item exposure distribution. This weakness can be ameliorated by using randomesque exposure control, which successfully balances the item pool usage. Overall, the MIT-S method with randomesque exposure control is recommended for achieving better testing efficiency while maintaining measurement precision and balanced item pool usage when the underlying IRT model is 1PL. 相似文献