期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

毛秀珍辛涛《心理学报》2013,45(6):694-703

项目曝光率关系到题库建设和测验安全,是计算机化自适应测验(Computerized Adaptive Testing, CAT)需要考虑的重要问题。在认知诊断 CAT 情形下,首先基于传统 CAT 中 a-分层方法的思想提出按项目信息量对题库分层的分层多阶段(Stratified Multistage, SM)选题方法;然后将 SM 方法与项目合格(Item Eligibility, IE)方法相结合得到SMIE方法。在此基础上,开展模拟研究比较SM、IE、SMIE、最大修正优先指标(Maximum Modified Priority Index, MMPI)方法、限制阈值(Restrictive Threshold, RT)方法和限制进度(Restrictive Progressive, RPG)方法的选题表现。总体上,它们的测量精度从高到低依次为IE、SM、SMIE、RT、RPG和MMPI方法;项目曝光分布均匀性的优劣次序为MMPI、RPG、SMIE、RT、SM和IE方法;SMIE和RT方法能较好地平衡测量精度和项目曝光均匀性要求。相似文献

2.

结合a分层的兼具项目曝光和广义测验重叠率控制的选题策略

郭磊王卓然王丰边玉芳《心理学报》2014,46(5):702-713

测验安全和题库使用率在计算机化自适应测验中十分重要, 特别是高风险测验。传统的SHGT法兼具同时控制项目曝光率和广义测验重叠率的功能, 但题库使用率较差。a分层法能够提高题库使用率, 但对过度曝光的项目控制不足。本研究将a分层法的思想与SHGT法相结合, 各取所长, 提出了3种新的选题方法：SHGT_a法, SHGT_b法和SHGT_c法。研究结果表明：(1)与SHGT法相比, 新方法均可以在有效地控制项目曝光率和广义测验重叠率同时, 极大地提高题库使用率; (2)随着预设项目曝光率(rmax)和广义测验重叠率( )取值的增大以及共享人数a的减小, 新方法对被试能力估计的精度呈上升趋势。比起SHGT法, 新方法仍能保持很高的题库使用率; (3)当区分度和难度的相关(rab)较大时, SHGT_b和SHGT_c法在能力估计精度方面优于SHGT_a法; (4)在不同的测验考察内容比例下, 3种新方法对被试能力估计的精度均较好; (5)与SHGT法相比, 新方法能够有效地控制项目曝光率过度控制的问题。相似文献

3.

Investigating the relationship between item exposure and test overlap: Item sharing and item pooling

Shu‐Ying Chen Pui‐Wa Lei 《The British journal of mathematical and statistical psychology》2010,63(1):205-226

To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed‐length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two. 相似文献

4.

Item selection methods with exposure and time control for computerized classification test

Yingshi Huang He Ren Ping Chen 《The British journal of mathematical and statistical psychology》2023,76(1):52-68

Computerized classification testing (CCT) commonly chooses items maximizing information at the cut score, which yields the most information for decision-making. However, a corollary problem is that all examinees will be given the same set of items, resulting in high test overlap rate and unbalanced item bank usage, which threatens test security. Moreover, another pivotal issue for CCT is time control. Since both the extremely long response time (RT) and large RT variability across examinees intensify time-induced anxiety, it is crucial to reduce the number of examinees exceeding the time limitation and the differences between examinees' test-taking times. To satisfy these practical needs, this paper proposes the novel idea of stage adaptiveness to tailor the item selection process to the decision-making requirement in each step and generate fresh insight into the existing response time selection method. Results indicate that a balanced item usage as well as short and stable test times across examinees can be achieved via the new methods. 相似文献

5.

认知诊断CAT中具有非统计约束选题方法的比较

毛秀珍辛涛《心理学报》2014,46(12):1910-1922

项目曝光控制和内容约束关系到测验安全、测验的信度和效度, 是计算机化自适应测验(Computerized Adaptive Testing, CAT)中两类重要的非统计约束条件。本文在认知诊断CAT中针对内容约束和项目曝光控制要求, 运用5种方法选择测验项目。它们分别是：(1) Monte Carlo方法与项目合格方法相结合, 记为MC-IE; (2) Monte Carlo方法与最大优先指标方法相结合, 记为MC-MPI; (3) Monte Carlo方法与限制阈值方法相结合, 记为MC-RT; (4) Monte Carlo方法与限制进度指标方法相结合, 记为MC-RPG以及(5) Monte Carlo方法与最大后验概率方法相结合, 记为MC-PP。然后通过在线性、收敛、发散、无结构和独立五种属性结构下构建题库并运用重参化融融统和模型模拟被试反应比较它们的选题表现。研究发现, (1) 相同选题方法在不同属性结构下项目曝光率的分布类似, 测量精度按线性、收敛、发散、无结构和独立结构的顺序依次降低; (2) 相同属性结构下, 不同方法的测量精度高低依次为MC-PP、MC-IE、MC-RT、MC-MPI和MC-RPG方法; 项目曝光均匀性优劣依次为MC-RPG、MC-MPI、MC-RT、MC-IE和MC-PP方法。统一量纲值表明, MC-RPG方法的综合表现最好, MC-MPI方法的表现次之。相似文献

6.

结合题目作答时间的计算机化自适应测验选题方法

郭治辰汪大勋蔡艳涂冬波《心理科学》2021,(5):1241-1248

计算机形式的测验能够记录考生在测验中的题目作答时间（Response Time, RT），作为一种重要的辅助信息来源，RT对于测验开发和管理具有重要的价值，特别是在计算机化自适应测验（Computerized Adaptive Testing, CAT）领域。本文简要介绍了RT在CAT选题方面应用并作以简评，分析了这些技术在实践中的可行性。最后，探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。相似文献

7.

不同认知结构被试的测验设计模式

彭亚风罗照盛李喻骏高椿雷《心理学报》2018,50(1):130-140

正如不同的病症需要使用不同的医疗技术方法来诊断一样, 不同的认知结构也需要设计对应的测验模式来进行诊断, 从而保证测验具有高质量的诊断评估效果。但传统测验形式未考虑不同认知结构的针对性诊断测验需求, 导致“千人一卷”在测验效率上有所不足; 认知诊断计算机化自适应测验虽可针对不同认知结构的被试施测不同的项目, 然而支持自适应过程的题库却没有针对不同认知结构被试设计对应的项目, 导致题库使用效率较低。要解决上述问题的关键在于, 探索如何针对不同认知结构设计相对应的测验模式。本研究采用Monte Carlo模拟, 对六种属性层级关系下, 不同认知结构的测验设计模式进行探讨。实验结果表明(1)同一属性层级关系下, 不同认知结构的最佳测验设计模式不同; (2)依据不同认知结构的最佳测验设计模式构建的题库具有更高的使用效率。测验编制者可以根据实验结果针对不同认知结构优化对应的测验设计模式, 并用于指导题库建设。相似文献

8.

多级计分认知诊断计算机化自适应测验的新选题方法

高旭亮王芳龚毅《心理科学》2021,(3):728-736

认知诊断计算机化自适应测验（Cognitive Diagnosis Computerized Adaptive Testing, CD-CAT）是认知诊断评估和计算机化自适应测验两者的结合,兼具认知诊断和自适应测验的特点。目前,针对CD-CAT的研究几乎都集中在0-1二级计分的数据。然而,在教育和心理评估的实际应用中,存在大量的多级计分的数据。因此,本研究探讨了多级计分CD-CAT（Polytomous CD-CAT, PCD-CAT）的实现技术,并提出了2种新的选题方法。通过模拟实验比较了新选题方法和传统选题方法在PCD-CAT的效果,结果表明：在定长PCD-CAT条件下,2种新选题方法的模式分类准确率是最高的,而在非定长PCD-CAT条件下,2种新方法的测验效率也是最高的。相似文献

9.

一种高效的CD-CAT在线标定新方法：基于熵的信息增益与EM视角

谭青蓉汪大勋罗芬蔡艳涂冬波《心理学报》2021,53(11):1286-1300

项目增补(Item Replenishing)对认知诊断计算机自适应测验(CD-CAT)题库的维护有着至关重要的作用, 而在线标定是一种重要的项目增补方式。基于数据挖掘中特征选择(Feature Selection)的思路, 提出一种高效的基于熵的信息增益的在线标定方法(记为IGEOCM), 该方法利用被试在新旧题上的作答联合估计新题的Q矩阵和项目参数。研究采用Monte Carlo模拟实验验证所开发新方法的效果, 并同时与已有的在线标定方法SIE、SIE-R-BIC和RMSEA-N进行比较。结果表明：新开发的IGEOCM在各实验条件下均具有较好的项目标定精度和项目估计效率, 且整体上优于已有的SIE等方法; 同时, IGEOCM标定新题所需的时间低于SIE等方法。总之, 研究为CD-CAT题库中项目的增补提供了一种更为高效、准确的方法。相似文献

10.

Comparing single-pool and multiple-pool designs regarding test security in computerized testing

Zhang J Chang HH Yi Q 《Behavior research methods》2012,44(3):742-752

This article compares the use of single- and multiple-item pools with respect to test security against item sharing among some examinees in computerized testing. A simulation study was conducted to make a comparison among different pool designs using the item selection method of maximum item information with the Sympson-Hetter exposure control and content balance. The results from the simulation study indicate that two-pool designs have a better degree of resistance to item sharing than do the single-pool design in terms of measurement precision in ability estimation. This article further characterizes the conditions under which employing a multiple-pool design is better than using a single, whole pool in terms of minimizing the number of compromised items encountered by examinees under a randomized item selection method. Although no current computerized testing program endorses the randomized item selection method, the results derived in this study can shed some light on item pool designs regarding test security for all item selection algorithms, especially those that try to equalize or balance item exposure rates by employing a randomized item selection method locally, such as the a-stratified-with-b-blocking method. 相似文献

11.

Does Standard Deviation Matter? Using “Standard Deviation” to Quantify Security of Multistage Testing

Chun Wang Yi Zheng Hua-Hua Chang 《Psychometrika》2014,79(1):154-174

With the advent of web-based technology, online testing is becoming a mainstream mode in large-scale educational assessments. Most online tests are administered continuously in a testing window, which may post test security problems because examinees who take the test earlier may share information with those who take the test later. Researchers have proposed various statistical indices to assess the test security, and one most often used index is the average test-overlap rate, which was further generalized to the item pooling index (Chang & Zhang, 2002, 2003). These indices, however, are all defined as the means (that is, the expected proportion of common items among examinees) and they were originally proposed for computerized adaptive testing (CAT). Recently, multistage testing (MST) has become a popular alternative to CAT. The unique features of MST make it important to report not only the mean, but also the standard deviation (SD) of test overlap rate, as we advocate in this paper. The standard deviation of test overlap rate adds important information to the test security profile, because for the same mean, a large SD reflects that certain groups of examinees share more common items than other groups. In this study, we analytically derived the lower bounds of the SD under MST, with the results under CAT as a benchmark. It is shown that when the mean overlap rate is the same between MST and CAT, the SD of test overlap tends to be larger in MST. A simulation study was conducted to provide empirical evidence. We also compared the security of MST under the single-pool versus the multiple-pool designs; both analytical and simulation studies show that the non-overlapping multiple-pool design will slightly increase the security risk. 相似文献

12.

Incorporating randomness in the Fisher information for improving item‐exposure control in CATs

Juan Ramón Barrada Julio Olea Vicente Ponsoda Francisco José Abad 《The British journal of mathematical and statistical psychology》2008,61(2):493-513

The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item‐exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods ( Revuelta & Ponsoda, 1998 ) and the proportional method ( Segall, 2004a ), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved. 相似文献

13.

允许CAT题目检查的区块题目袋方法

林喆陈平辛涛《心理学报》2015,47(9):1188-1198

允许题目检查能够促进计算机化自适应测验(CAT)在实际中的应用。在不影响能力估计精度和测验公平性的前提下, 允许CAT题目检查能够缓解考生考试焦虑, 减少无关因素引起的测量误差。区块题目袋方法是连续区块方法与题目袋方法的结合, 不仅能允许CAT题目检查, 还能够弥补题目袋方法的不足。研究结果表明：(1)合理作答策略下, 区块题目袋方法的估计精度在低能力水平上要优于题目袋方法; (2)在应对类似Wainer作答策略时, 区块题目袋方法的估计精度在所有能力水平上均优于题目袋方法。(3)随着区块数的增加, 区块题目袋方法的能力估计精度越接近无修改的基线水平。相似文献

14.

基于GPCM的计算机自适应测验选题策略比较 总被引：1，自引：0，他引：1

刘珍丁树良林海菁《心理学报》2008,40(5):618-625

选题策略是计算机自适应测验（Computerized Adaptive Testing , CAT）研究的一项重要内容,它的好坏直接关系到考试的信度、效度及考试的安全性。CAT的许多研究与应用,都建立在0-1二级评分模型基础上,对多级评分CAT的选题策略的研究很少报导。目前国内虽已开展了基于GRM的CAT研究,但基于GPCM的CAT的研究尚未见有关报道。本文通过计算机模拟程序,对基于拓广分部评分模型(Generalized Partial Credit Model, GPCM)下的CAT的四种选题策略在多种情况下进行了比较研究。研究结果表明：被试能力呈正态分布时,选题策略的使用效果与项目步骤参数分布有很大的关系。（1）项目步骤参数均服从正态分布时,采用能力与项目步骤参数匹配选题策略效果最佳;（2）项目步骤参数均服从均匀分布时,能力与项目步骤参数平均数匹配选题策略效果最佳相似文献

15.

兼顾测验效率和题库使用率的CD-CAT选题策略

下载免费PDF全文

汪文义丁树良宋丽红《心理科学》2014,37(1):212-216

CD–CAT中已有选题策略较注重测验效率,而对题库使用率不够重视。针对此问题,基于DINA模型,引入两种新的选题策略KLED和RHA,同时对HA进行模拟研究。结果显示：PWKL与KLED只在测验效率上具有优势;KLED若按属性向量分层,题库使用率有所提高,KLED比ED更容易推广到其他有显式表达的诊断模型场合;HA、RHA和RP–PWKL可较好兼顾测验效度和题库使用率,但RP-PWKL需设置项目的最大曝光率阈值。两种新选题方法在定长和变长CD-CAT都具有一定的应用价值。相似文献

16.

认知诊断计算机自适应测验中平衡属性收敛的新方法

孙小坚王钰彤张世夷辛涛《心理科学》2005,(5):1236-1244

提出两种认知诊断计算机自适应测验下平衡属性收敛的新方法（MABI、RTA）,模拟研究系统探讨和比较了此二者与已有方法（ABI、IABI和RABI）的表现。结果发现：（1）新方法较不考虑属性收敛的方法有更高的准确率以及更均衡的题目使用率;（2）新方法较ABI和RABI有稍低的准确性,但有更平衡的题目使用率;（3）新方法与IABI的准确性和题目使用率在不同选题策略下各有合优势。总之,两种新方法较好地兼顾测量准确性、题目使用率以及题库曝光情况。相似文献

17.

CAT中结合贝叶斯方法与序贯监测程序的题库质量监控技术

郭磊刘伟《心理科学》2018,(1):189-195

Zhang(2013)提出了序贯监测程序(SMP)用以检测CAT中的题目在作答过程中是否发生泄漏。然而,该方法会出现虚报且未关注在题目泄漏后,对能力估计精度产生的影响。本研究在SMP基础上引入个人拟合指标,提出SMP_PFI方法,拟在给定的置信度上核实被SMP标记的题目是否真正泄漏,并探查SMP_PFI方法对能力估计精度与被封存题目数量关系的影响。实验结果表明：新方法能够有效降低SMP单独运行时的一类错误。通过控制CPFI值能够平衡能力估计精度与被封存题目数量之间的关系。相似文献

18.

认知诊断计算机自适应测验中平衡属性收敛的新方法

孙小坚王钰彤张世夷辛涛《心理科学》2019,(5):1236-1244

提出两种认知诊断计算机自适应测验下平衡属性收敛的新方法（MABI、RTA）,模拟研究系统探讨和比较了此二者与已有方法（ABI、IABI和RABI）的表现。结果发现：（1）新方法较不考虑属性收敛的方法有更高的准确率以及更均衡的题目使用率;（2）新方法较ABI和RABI有稍低的准确性,但有更平衡的题目使用率;（3）新方法与IABI的准确性和题目使用率在不同选题策略下各有合优势。总之,两种新方法较好地兼顾测量准确性、题目使用率以及题库曝光情况。相似文献

19.

多级评分项目的多维CAT选题策略开发

韩雨婷高旭亮汪大勋蔡艳涂冬波《心理科学》2018,(6):1500-1507

本研究开发了两种新的适用于多级评分项目的多维计算机化自适应测验（PMCAT）的选题策略——修正的连续熵（RCEM）和修正的后验期望KL信息（MKB）方法,并与以往PMCAT的选题策略进行了对比研究。Monte Carlo实验结果表明：两种新开发的选题策略比原方法估计精度更高,并且RCEM方法在所有选题策略中曝光率最低。新开发的选题策略具有较理想的估计精度和曝光控制效果,为PMCAT在实践中的应用提供了新的方法支持。相似文献

20.

Developing new online calibration methods for multidimensional computerized adaptive testing

下载免费PDF全文

Ping Chen Chun Wang Tao Xin Hua‐Hua Chang 《The British journal of mathematical and statistical psychology》2017,70(1):81-117

Multidimensional computerized adaptive testing (MCAT) has received increasing attention over the past few years in educational measurement. Like all other formats of CAT, item replenishment is an essential part of MCAT for its item bank maintenance and management, which governs retiring overexposed or obsolete items over time and replacing them with new ones. Moreover, calibration precision of the new items will directly affect the estimation accuracy of examinees’ ability vectors. In unidimensional CAT (UCAT) and cognitive diagnostic CAT, online calibration techniques have been developed to effectively calibrate new items. However, there has been very little discussion of online calibration in MCAT in the literature. Thus, this paper proposes new online calibration methods for MCAT based upon some popular methods used in UCAT. Three representative methods, Method A, the ‘one EM cycle’ method and the ‘multiple EM cycles’ method, are generalized to MCAT. Three simulation studies were conducted to compare the three new methods by manipulating three factors (test length, item bank design, and level of correlation between coordinate dimensions). The results showed that all the new methods were able to recover the item parameters accurately, and the adaptive online calibration designs showed some improvements compared to the random design under most conditions. 相似文献