首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed‐length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two.  相似文献   

2.
毛秀珍  辛涛 《心理学报》2014,46(12):1910-1922
项目曝光控制和内容约束关系到测验安全、测验的信度和效度, 是计算机化自适应测验(Computerized Adaptive Testing, CAT)中两类重要的非统计约束条件。本文在认知诊断CAT中针对内容约束和项目曝光控制要求, 运用5种方法选择测验项目。它们分别是:(1) Monte Carlo方法与项目合格方法相结合, 记为MC-IE; (2) Monte Carlo方法与最大优先指标方法相结合, 记为MC-MPI; (3) Monte Carlo方法与限制阈值方法相结合, 记为MC-RT; (4) Monte Carlo方法与限制进度指标方法相结合, 记为MC-RPG以及(5) Monte Carlo方法与最大后验概率方法相结合, 记为MC-PP。然后通过在线性、收敛、发散、无结构和独立五种属性结构下构建题库并运用重参化融融统和模型模拟被试反应比较它们的选题表现。研究发现, (1) 相同选题方法在不同属性结构下项目曝光率的分布类似, 测量精度按线性、收敛、发散、无结构和独立结构的顺序依次降低; (2) 相同属性结构下, 不同方法的测量精度高低依次为MC-PP、MC-IE、MC-RT、MC-MPI和MC-RPG方法; 项目曝光均匀性优劣依次为MC-RPG、MC-MPI、MC-RT、MC-IE和MC-PP方法。统一量纲值表明, MC-RPG方法的综合表现最好, MC-MPI方法的表现次之。  相似文献   

3.
In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (θ) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item‐overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non‐statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS‐MI), integrating the MS and MI approaches, in testing with non‐statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS–MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS–MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed.  相似文献   

4.
CD–CAT中已有选题策略较注重测验效率,而对题库使用率不够重视。针对此问题,基于DINA模型,引入两种新的选题策略KLED和RHA,同时对HA进行模拟研究。结果显示:PWKL与KLED只在测验效率上具有优势;KLED若按属性向量分层,题库使用率有所提高,KLED比ED更容易推广到其他有显式表达的诊断模型场合;HA、RHA和RP–PWKL可较好兼顾测验效度和题库使用率,但RP-PWKL需设置项目的最大曝光率阈值。两种新选题方法在定长和变长CD-CAT都具有一定的应用价值。  相似文献   

5.
毛秀珍  辛涛 《心理学报》2013,45(6):694-703
项目曝光率关系到题库建设和测验安全,是计算机化自适应测验(Computerized Adaptive Testing, CAT)需要考虑的重要问题。在认知诊断 CAT 情形下,首先基于传统 CAT 中 a-分层方法的思想提出按项目信息量对题库分层的分层多阶段(Stratified Multistage, SM)选题方法;然后将 SM 方法与项目合格(Item Eligibility, IE)方法相结合得到SMIE方法。在此基础上,开展模拟研究比较SM、IE、SMIE、最大修正优先指标(Maximum Modified Priority Index, MMPI)方法、限制阈值(Restrictive Threshold, RT)方法和限制进度(Restrictive Progressive, RPG)方法的选题表现。总体上,它们的测量精度从高到低依次为IE、SM、SMIE、RT、RPG和MMPI方法;项目曝光分布均匀性的优劣次序为MMPI、RPG、SMIE、RT、SM和IE方法;SMIE和RT方法能较好地平衡测量精度和项目曝光均匀性要求。  相似文献   

6.
陈平  李珍  辛涛 《心理与行为研究》2011,9(2):125-132,153
项目曝光控制是认知诊断计算机化自适应测验(CD-CAT)中亟需解决的重要问题之一。采用蒙特卡洛模拟方法对CD-CAT中五种常用选题策略(随机化方法、KL信息量方法、香农熵方法、后验加权的KL信息量方法和综合后验加权和距离加权的KL信息量方法)的题库使用情况进行探讨。结果发现:四种非随机化选题策略的题库使用均匀性较差、测验重叠率高,从而导致测验安全性较差;香农熵方法的判准率总是最高。今后可以将传统CAT中的项目曝光控制技术融入到CD-CAT选题策略中。  相似文献   

7.
In this article, four item selection methods in computerized adaptive testing are examined in terms of classification accuracy and consistency, including two popular heuristics for constraint management, the maximum priority index (MPI) method and the weighted deviation modeling method, as well as the widely known maximum Fisher information method and randomized item selection as baselines. Results suggest that the MPI method is able to meet constraints and keep test overlap rate low. Among the four methods, it is the only one that manages to produce parallel forms in terms of content coverage and, consequently, the only method to which the idea of classification consistency applies. With tests as short as 12 items, the MPI method does fairly well in classifying examinees accurately and consistently. Its performance improves with longer tests. The effects of number of decision categories and cut score locations are also examined. Recommendations are made in the Discussion section.  相似文献   

8.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。  相似文献   

9.
10.
郭磊  郑蝉金  边玉芳 《心理学报》2015,47(1):129-140
本研究借鉴传统计算机化自适应测验的思想, 并结合认知诊断的特点, 在认知诊断框架下提出了4种变长CD-CAT的终止规则, 分别是属性标准误法(SEA)、邻近后验概率之差法(DAPP)、二等分法(HA)以及混合法(HM)。在未控制曝光和采用不同曝光控制条件下, 与HSU法及KL法进行了比较。研究结果表明:(1) 终止条件越严格, 平均测验长度越长, 按测验长度最大值终止的测验百分比越大, 模式判准率越高。(2) 当未加入曝光控制时, 4种新的终止规则均有较好表现, 与HSU法十分接近。随着最大后验概率预设值的增加或e的减小, 模式判准率呈上升趋势, 平均测验长度逐渐增加, 但在题库使用率方面均较差。(3) 当加入项目曝光控制时, 6种变长终止规则下的题库使用率有了极大的提升, 仍能保持较高的模式判准率, 并且不同的曝光控制方法对终止规则的影响是不同的。其中, 相对标准终止规则极易受到曝光控制方法的影响。(4) 综合来看, SEA、HM以及HA法在各项指标上的表现与HSU法基本一致, 其次为KL法和DAPP法。  相似文献   

11.
本研究开发了两种新的适用于多级评分项目的多维计算机化自适应测验(PMCAT)的选题策略——修正的连续熵(RCEM)和修正的后验期望KL信息(MKB)方法,并与以往PMCAT的选题策略进行了对比研究。Monte Carlo实验结果表明:两种新开发的选题策略比原方法估计精度更高,并且RCEM方法在所有选题策略中曝光率最低。新开发的选题策略具有较理想的估计精度和曝光控制效果,为PMCAT在实践中的应用提供了新的方法支持。  相似文献   

12.
郭磊  王卓然  王丰  边玉芳 《心理学报》2014,46(5):702-713
测验安全和题库使用率在计算机化自适应测验中十分重要, 特别是高风险测验。传统的SHGT法兼具同时控制项目曝光率和广义测验重叠率的功能, 但题库使用率较差。a分层法能够提高题库使用率, 但对过度曝光的项目控制不足。本研究将a分层法的思想与SHGT法相结合, 各取所长, 提出了3种新的选题方法:SHGT_a法, SHGT_b法和SHGT_c法。研究结果表明:(1)与SHGT法相比, 新方法均可以在有效地控制项目曝光率和广义测验重叠率同时, 极大地提高题库使用率; (2)随着预设项目曝光率(rmax)和广义测验重叠率( )取值的增大以及共享人数a的减小, 新方法对被试能力估计的精度呈上升趋势。比起SHGT法, 新方法仍能保持很高的题库使用率; (3)当区分度和难度的相关(rab)较大时, SHGT_b和SHGT_c法在能力估计精度方面优于SHGT_a法; (4)在不同的测验考察内容比例下, 3种新方法对被试能力估计的精度均较好; (5)与SHGT法相比, 新方法能够有效地控制项目曝光率过度控制的问题。  相似文献   

13.
郭磊  刘伟 《心理科学》2018,(1):189-195
Zhang(2013)提出了序贯监测程序(SMP)用以检测CAT中的题目在作答过程中是否发生泄漏。然而,该方法会出现虚报且未关注在题目泄漏后,对能力估计精度产生的影响。本研究在SMP基础上引入个人拟合指标,提出SMP_PFI方法,拟在给定的置信度上核实被SMP标记的题目是否真正泄漏,并探查SMP_PFI方法对能力估计精度与被封存题目数量关系的影响。实验结果表明:新方法能够有效降低SMP单独运行时的一类错误。通过控制CPFI值能够平衡能力估计精度与被封存题目数量之间的关系。  相似文献   

14.
Computerized adaptive testing under nonparametric IRT models   总被引:1,自引:0,他引:1  
Nonparametric item response models have been developed as alternatives to the relatively inflexible parametric item response models. An open question is whether it is possible and practical to administer computerized adaptive testing with nonparametric models. This paper explores the possibility of computerized adaptive testing when using nonparametric item response models. A central issue is that the derivatives of item characteristic Curves may not be estimated well, which eliminates the availability of the standard maximum Fisher information criterion. As alternatives, procedures based on Shannon entropy and Kullback–Leibler information are proposed. For a long test, these procedures, which do not require the derivatives of the item characteristic eurves, become equivalent to the maximum Fisher information criterion. A simulation study is conducted to study the behavior of these two procedures, compared with random item selection. The study shows that the procedures based on Shannon entropy and Kullback–Leibler information perform similarly in terms of root mean square error, and perform much better than random item selection. The study also shows that item exposure rates need to be addressed for these methods to be practical. The authors would like to thank Hua Chang for his help in conducting this research.  相似文献   

15.
提出了两种适用于定长CD-CAT的题目曝光控制方法(HIRP、HIRT),这些方法在保证较高分类准确率的同时还有较合理的题目曝光率,新方法由二分化方法和RP及RT方法进行结合并适当调整而得到。模拟研究比较了其与RP、RT、SM、SMIE、RHA和SDBS的表现,结果表明: (1)HIRP的分类准确率和题目曝光率均好于SM、SMIE和SDBS;(2)HIRT的题目曝光率较RP、SM、SMIE、RHA和SDBS稍差,但分类准确率更高;(3)HIRP的分类准确率低于RT和RP,但题目曝光控制要更好。  相似文献   

16.
Methods of cognitive diagnostic computerized adaptive testing (CD-CAT) under higher-order cognitive diagnosis models have been developed to simultaneously provide estimates of the attribute mastery statuses of examinees for formative assessment and estimates of a latent continuous trait for overall summative evaluation. In a typical CD-CAT environment, examinees are often subject to a time limit, and the examinees’ response times (RTs) for specific test items can be routinely recorded by custom-made programs. Because examinees are individually administered tailored sets of test items from the item pool, they may experience different levels of speededness during testing and different levels of risk of running out of time. In this study, RTs were considered during the item-selection procedure to control the test speededness and the RTs were treated as useful information for improving latent trait estimation in CD-CAT under the higher-order deterministic input, noisy ‘and’ gate (DINA) model. A modified posterior-weighted Kullback–Leibler (PWKL) method that maximizes the item information per time unit and a shadow-test method that assembles a provisional test subject to a specified time constraint were developed. Two simulation studies were conducted to assess the effects of the proposed methods on the quality of CD-CAT for fixed- and variable-length exams. The results show that, compared with the traditional PWKL method, the proposed methods preserve a lower risk of running out of time while ensuring satisfactory attribute estimation and providing more accurate estimates of the latent trait and speed parameters. Finally, several suggestions for future research are proposed.  相似文献   

17.
Content balancing is often required in the development and implementation of computerized adaptive tests (CATs). In the current study, we propose a modified a‐stratified method, the a‐stratified method with content blocking. As a further refinement of a‐stratified CAT designs, the new method incorporates content specifications into item pool stratification. Simulation studies were conducted to compare the new method with three previous item selection methods: the a‐stratified method; the a‐stratified with b‐blocking method; and the maximum Fisher information method with Sympson‐Hetter exposure control. The results indicated that the refined a‐stratified design performed well in reducing item overexposure rates, balancing item usage within the pool, and maintaining measurement precision, in a situation where all four procedures were forced to balance content.  相似文献   

18.
The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item‐exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods ( Revuelta & Ponsoda, 1998 ) and the proportional method ( Segall, 2004a ), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved.  相似文献   

19.
In this article, we propose a simplified version of the maximum information per time unit method (MIT; Fan, Wang, Chang, & Douglas, Journal of Educational and Behavioral Statistics 37: 655–670, 2012), or MIT-S, for computerized adaptive testing. Unlike the original MIT method, the proposed MIT-S method does not require fitting a response time model to the individual-level response time data. It is also computationally efficient. The performance of the MIT-S method was compared against that of the maximum information (MI) method in terms of measurement precision, testing time saving, and item pool usage under various item response theory (IRT) models. The results indicated that when the underlying IRT model is the two- or three-parameter logistic model, the MIT-S method maintains measurement precision and saves testing time. It performs similarly to the MI method in exposure control; both result in highly skewed item exposure distributions, due to heavy reliance on the highly discriminating items. If the underlying model is the one-parameter logistic (1PL) model, the MIT-S method maintains the measurement precision and saves a considerable amount of testing time. However, its heavy reliance on time-saving items leads to a highly skewed item exposure distribution. This weakness can be ameliorated by using randomesque exposure control, which successfully balances the item pool usage. Overall, the MIT-S method with randomesque exposure control is recommended for achieving better testing efficiency while maintaining measurement precision and balanced item pool usage when the underlying IRT model is 1PL.  相似文献   

20.
多级评分计算机化自适应测验动态综合选题策略   总被引:1,自引:0,他引:1  
罗芬  丁树良  王晓庆 《心理学报》2012,44(3):400-412
多级评分可以提供更多关于被试的信息, 是计算机化自适应测验的一个发展方向, 选题策略是计算机化自适应测验的研究重点。对于多级评分的等级反应模型, 本文拟用区间估计的思想改进近期提出的几种选题策略, 并且将两级评分b-STR和a-STR推广到多级评分以改进最大信息量选题策略。Monte Carlo模拟实验表明在达到或接近原有选题策略测验精度的基础上, 本文提出的几种新选题策略有的能够有效降低测验长度, 有的可以极大降低项目曝光率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号