首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cognitive diagnosis models of educational test performance rely on a binary Q‐matrix that specifies the associations between individual test items and the cognitive attributes (skills) required to answer those items correctly. Current methods for fitting cognitive diagnosis models to educational test data and assigning examinees to proficiency classes are based on parametric estimation methods such as expectation maximization (EM) and Markov chain Monte Carlo (MCMC) that frequently encounter difficulties in practical applications. In response to these difficulties, non‐parametric classification techniques (cluster analysis) have been proposed as heuristic alternatives to parametric procedures. These non‐parametric classification techniques first aggregate each examinee's test item scores into a profile of attribute sum scores, which then serve as the basis for clustering examinees into proficiency classes. Like the parametric procedures, the non‐parametric classification techniques require that the Q‐matrix underlying a given test be known. Unfortunately, in practice, the Q‐matrix for most tests is not known and must be estimated to specify the associations between items and attributes, risking a misspecified Q‐matrix that may then result in the incorrect classification of examinees. This paper demonstrates that clustering examinees into proficiency classes based on their item scores rather than on their attribute sum‐score profiles does not require knowledge of the Q‐matrix, and results in a more accurate classification of examinees.  相似文献   

2.
Differential item functioning (DIF) assessment is key in score validation. When DIF is present scores may not accurately reflect the construct of interest for some groups of examinees, leading to incorrect conclusions from the scores. Given rising immigration, and the increased reliance of educational policymakers on cross-national assessments such as Programme for International Student Assessment, Trends in International Mathematics and Science Study, and Progress in International Reading Literacy Study (PIRLS), DIF with regard to native language is of particular interest in this context. However, given differences in language and cultures, assuming similar cross-national DIF may lead to mistaken assumptions about the impact of immigration status, and native language on test performance. The purpose of this study was to use model-based recursive partitioning (MBRP) to investigate uniform DIF in PIRLS items across European nations. Results demonstrated that DIF based on mother's language was present for several items on a PIRLS assessment, but that the patterns of DIF were not the same across all nations.  相似文献   

3.
This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two approaches based on IPD flagging rates, type I error and power rates, and recovery of scale transformation coefficients. Results indicate that the two approaches render similar outcomes with stable anchor sets. However, they can produce dissimilar results with unstable anchor sets because of differences in the performance of their IPD components. Further, the findings of this study caution about working backward from equated cut scores to motivate the selection of an anchor set.  相似文献   

4.
Computerized classification testing (CCT) commonly chooses items maximizing information at the cut score, which yields the most information for decision-making. However, a corollary problem is that all examinees will be given the same set of items, resulting in high test overlap rate and unbalanced item bank usage, which threatens test security. Moreover, another pivotal issue for CCT is time control. Since both the extremely long response time (RT) and large RT variability across examinees intensify time-induced anxiety, it is crucial to reduce the number of examinees exceeding the time limitation and the differences between examinees' test-taking times. To satisfy these practical needs, this paper proposes the novel idea of stage adaptiveness to tailor the item selection process to the decision-making requirement in each step and generate fresh insight into the existing response time selection method. Results indicate that a balanced item usage as well as short and stable test times across examinees can be achieved via the new methods.  相似文献   

5.
具有认知诊断功能的计算机化自适应测验的研究与实现   总被引:3,自引:2,他引:1  
林海菁  丁树良 《心理学报》2007,39(4):747-753
构造具有认知诊断功能的计算机化自适应测验(Computerized Adaptive Testing,CAT),关键在于设计不同于传统CAT的选题策略。本文采用先认知诊断后估计能力的方法,在诊断阶段用状态转换图描述特定认知领域中所有知识状态及这些状态之间的联系,以图的深度优先算法为基础设计选题策略;而在能力估计精细化阶段,每个被试所测项目,不仅与其能力估计值相匹配,且只与其所掌握的属性相关。本文采用蒙特卡罗模拟针对三种不同的属性结构进行试验,结果良好  相似文献   

6.
本研究开发了两种新的适用于多级评分项目的多维计算机化自适应测验(PMCAT)的选题策略——修正的连续熵(RCEM)和修正的后验期望KL信息(MKB)方法,并与以往PMCAT的选题策略进行了对比研究。Monte Carlo实验结果表明:两种新开发的选题策略比原方法估计精度更高,并且RCEM方法在所有选题策略中曝光率最低。新开发的选题策略具有较理想的估计精度和曝光控制效果,为PMCAT在实践中的应用提供了新的方法支持。  相似文献   

7.
Multidimensional computerized adaptive testing (MCAT) has received increasing attention over the past few years in educational measurement. Like all other formats of CAT, item replenishment is an essential part of MCAT for its item bank maintenance and management, which governs retiring overexposed or obsolete items over time and replacing them with new ones. Moreover, calibration precision of the new items will directly affect the estimation accuracy of examinees’ ability vectors. In unidimensional CAT (UCAT) and cognitive diagnostic CAT, online calibration techniques have been developed to effectively calibrate new items. However, there has been very little discussion of online calibration in MCAT in the literature. Thus, this paper proposes new online calibration methods for MCAT based upon some popular methods used in UCAT. Three representative methods, Method A, the ‘one EM cycle’ method and the ‘multiple EM cycles’ method, are generalized to MCAT. Three simulation studies were conducted to compare the three new methods by manipulating three factors (test length, item bank design, and level of correlation between coordinate dimensions). The results showed that all the new methods were able to recover the item parameters accurately, and the adaptive online calibration designs showed some improvements compared to the random design under most conditions.  相似文献   

8.
计算机形式的测验能够记录考生在测验中的题目作答时间(Response Time, RT),作为一种重要的辅助信息来源,RT对于测验开发和管理具有重要的价值,特别是在计算机化自适应测验(Computerized Adaptive Testing, CAT)领域。本文简要介绍了RT在CAT选题方面应用并作以简评,分析了这些技术在实践中的可行性。最后,探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。  相似文献   

9.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。  相似文献   

10.
汪文义  丁树良  宋丽红 《心理学报》2015,47(12):1499-1510
分类是认知诊断评估的一个核心问题。基于观察反应模式与理想反应模式之间的距离的判别方法, 以确定性的理想反应模式为类中心, 而这没有考虑误差, 故未充分利用总体分布信息。为了更充分地利用总体分布信息、提高诊断分类效果和拓展诊断评估的适用性, 本研究提出给定知识状态条件下项目反应模式的条件期望向量为类中心的欧氏距离判别方法, 同时提出认知诊断模型下项目反应函数估计方法以获得这个条件期望向量。模拟研究表明:认知诊断模型下的项目反应函数估计方法得到的条件期望向量返真性较高, 获得的分布信息较准确; 在观察反应模式与理想反应模式差异大的情形下, 基于条件期望向量为类中心的欧氏距离判别方法优于基于理想反应模式为类中心的分类方法(广义距离方法和非参数方法)。研究可为认知诊断分类和等值方法提供一个参考。  相似文献   

11.
The study evaluated the reliability of pass and fail classifications for several teacher certification tests. Since these tests are used in the context of a cut score to classify examinees as pass and fail, evaluating the accuracy and consistency of these classifications is important. The classification accuracy and consistency statistics were estimated using the RELCLASS software. Results indicated the following. (1) The 29 teacher certification tests that were examined had a relatively high classification accuracy (0.827 to 0.999) and consistency (0.760 to 0.999). (2) Both classification accuracy and consistency increased as the difference between the mean and cut score increased. (3) Classification accuracy and consistency was higher for multiple-choice (MC) as compared to tests consisting of only constructed-response (CR) items or a combination of CR and MC items.  相似文献   

12.
谭青蓉  汪大勋  罗芬  蔡艳  涂冬波 《心理学报》2021,53(11):1286-1300
项目增补(Item Replenishing)对认知诊断计算机自适应测验(CD-CAT)题库的维护有着至关重要的作用, 而在线标定是一种重要的项目增补方式。基于数据挖掘中特征选择(Feature Selection)的思路, 提出一种高效的基于熵的信息增益的在线标定方法(记为IGEOCM), 该方法利用被试在新旧题上的作答联合估计新题的Q矩阵和项目参数。研究采用Monte Carlo模拟实验验证所开发新方法的效果, 并同时与已有的在线标定方法SIE、SIE-R-BIC和RMSEA-N进行比较。结果表明:新开发的IGEOCM在各实验条件下均具有较好的项目标定精度和项目估计效率, 且整体上优于已有的SIE等方法; 同时, IGEOCM标定新题所需的时间低于SIE等方法。总之, 研究为CD-CAT题库中项目的增补提供了一种更为高效、准确的方法。  相似文献   

13.
陈平 《心理学报》2016,48(9):1184-1198
在线标定技术由于具有诸多优点而被广泛应用于计算机化自适应测验(CAT)的新题标定。Method A是想法最直接、算法最简单的CAT在线标定方法, 但它具有明显的理论缺陷--在标定过程中将能力估计值视为能力真值。将全功能极大似然估计方法(FFMLE)与“利用充分性结果”估计方法(ECSE)的误差校正思路融入Method A (新方法分别记为FFMLE-Method A和ECSE-Method A), 从理论上对能力估计误差进行校正, 进而克服Method A的标定缺陷。模拟研究的结果表明:(1)在大多数实验条件下, 两种新方法较Method A总体上可以改进标定精度, 且在测验长度为10的短测验上的改进幅度最大; (2)当CAT测验长度较短或中等(10或20题)时, 两种新方法的表现与性能最优的MEM已非常接近。当测验长度较长(30题)时, ECSE-Method A的总体表现最好、优于MEM; (3)样本量越大, 各种方法的标定精度越高。  相似文献   

14.
林喆  陈平  辛涛 《心理学报》2015,47(9):1188-1198
允许题目检查能够促进计算机化自适应测验(CAT)在实际中的应用。在不影响能力估计精度和测验公平性的前提下, 允许CAT题目检查能够缓解考生考试焦虑, 减少无关因素引起的测量误差。区块题目袋方法是连续区块方法与题目袋方法的结合, 不仅能允许CAT题目检查, 还能够弥补题目袋方法的不足。研究结果表明:(1)合理作答策略下, 区块题目袋方法的估计精度在低能力水平上要优于题目袋方法; (2)在应对类似Wainer作答策略时, 区块题目袋方法的估计精度在所有能力水平上均优于题目袋方法。(3)随着区块数的增加, 区块题目袋方法的能力估计精度越接近无修改的基线水平。  相似文献   

15.
16.
Q矩阵标定是实施认知诊断评估的前提,已有Q矩阵修正方法并不太适合测验中已知属性向量的题目数较少的情形。根据拓展Q矩阵理论中可达阵R列与简化Q阵列存在布尔“或”关系,在一定认知假设下,率先提出可达阵R与简化Q阵的潜在反应列存在布尔“与”关系,并由此提出基于可达阵的Q矩阵标定方法。研究显示:在已知一个可达阵下,当可达阵项目的猜测或失误参数在.20以下且待标定项目的项目参数约在.30以下时,新方法所得Q矩阵元素返真率基本在.90以上,并且真实Q矩阵与估计Q矩阵下被试分类准确率差异很小;对于含5个属性的独立结构,新方法要求的随机样本的样本量较小;实证研究也印证了模拟研究的结论。新方法只需专家标定少量题目的Q矩阵,即已经标定的Q矩阵对应属性层级结构的可达阵。  相似文献   

17.
项目难度与被试能力分布最优匹配的模拟研究   总被引:2,自引:1,他引:1  
李金波  王权 《心理学报》1998,31(2):197-203
该文运用蒙特卡罗方法对被测试能力分布与测验项目难度分布的匹配问题进行模拟分析,分析表明当能力分布为正态分布正偏态分布和负偏态分布时分别与测验项目难度分布与为正态分布,正偏态分布和负偏态分布匹配,比别的匹配有更高测验期望信息值,测验最大信息测验 系数,并且测验信息曲线最大值的能力点与能力分布的众数愈相一致,测验项目参数估计值性真实值的相关也更高。  相似文献   

18.
本文提出一种多级计分项目下的个人拟合统计量R, 考察它在检测6种常见的异常作答模式(作弊、猜测、随机、粗心、创新作答、混合异常)下的表现, 并与标准化对数似然统计量lzp进行比较。结果表明:(1) 在异常作答覆盖率较低并且异常作答类型为作弊和猜测时, R的检测率显著高于lzp; (2) 随着测验长度和被试异常程度的增加, 两种统计量的检测率都会上升; (3) 在一些条件下, Rlzp检测效果接近。实证数据分析进一步展示了R统计量的使用方法和过程, 结果也表明R统计量具有较好的应用前景。  相似文献   

19.
肖涵敏  杜文久  张婷婷 《心理学报》2011,43(12):1462-1467
多级评分项目由于可以提供更多关于被试的信息而被广泛的使用。本文首先通过引用一个多级评分的数学试题, 给出了项目节点这一概念。假设被试在项目节点上的正确反应概率为二参数逻辑斯蒂模型之下, 本文通过分析三种不同类型的多级评分项目, 得出了三个评分模型, 其中一个和等级反应模型在形式上是一样的。鉴于我国目前考试测量所使用的多级评分项目的形式, 可以运用本文所述的项目节点的方法将项目评分模型统一提出。  相似文献   

20.
认知诊断计算机化自适应测验(Cognitive Diagnosis Computerized Adaptive Testing, CD-CAT)是认知诊断评估和计算机化自适应测验两者的结合,兼具认知诊断和自适应测验的特点。目前,针对CD-CAT的研究几乎都集中在0-1二级计分的数据。然而,在教育和心理评估的实际应用中,存在大量的多级计分的数据。因此,本研究探讨了多级计分CD-CAT(Polytomous CD-CAT, PCD-CAT)的实现技术,并提出了2种新的选题方法。通过模拟实验比较了新选题方法和传统选题方法在PCD-CAT的效果,结果表明:在定长PCD-CAT条件下,2种新选题方法的模式分类准确率是最高的,而在非定长PCD-CAT条件下,2种新方法的测验效率也是最高的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号