首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The concept of tailored testing is defined in general terms and in terms of its major components: item calibration, item selection, and ability estimation. Each of the components is described and the available methods are summarized. The end result is a 12-category classification scheme based on the component parts. Four specific procedures are described. References are presented to suggest that the more adaptive procedures are capable of more precise measurement.  相似文献   

2.
Psychometric methods for accurate and timely detection of item compromise have been a long-standing topic. While Bayesian methods can incorporate prior knowledge or expert inputs as additional information for item compromise detection, they have not been employed in item compromise detection itself. The current study proposes a two-phase Bayesian change-point framework for both stationary and real-time detection of changes in each item's compromise status. In Phase I, a stationary Bayesian change-point model for compromise detection is fitted to the observed responses over a specified time-frame. The model produces parameter estimates for the change-point of each item from uncompromised to compromised, as well as structural parameters accounting for the post-change response distribution. Using the post-change model identified in Phase I, the Shiryaev procedure for sequential testing is employed in Phase II for real-time monitoring of item compromise. The proposed methods are evaluated in terms of parameter recovery, detection accuracy, and detection efficiency under various simulation conditions and in a real data example. The proposed method also showed superior detection accuracy and efficiency compared to the cumulative sum procedure.  相似文献   

3.
Methods of cognitive diagnostic computerized adaptive testing (CD-CAT) under higher-order cognitive diagnosis models have been developed to simultaneously provide estimates of the attribute mastery statuses of examinees for formative assessment and estimates of a latent continuous trait for overall summative evaluation. In a typical CD-CAT environment, examinees are often subject to a time limit, and the examinees’ response times (RTs) for specific test items can be routinely recorded by custom-made programs. Because examinees are individually administered tailored sets of test items from the item pool, they may experience different levels of speededness during testing and different levels of risk of running out of time. In this study, RTs were considered during the item-selection procedure to control the test speededness and the RTs were treated as useful information for improving latent trait estimation in CD-CAT under the higher-order deterministic input, noisy ‘and’ gate (DINA) model. A modified posterior-weighted Kullback–Leibler (PWKL) method that maximizes the item information per time unit and a shadow-test method that assembles a provisional test subject to a specified time constraint were developed. Two simulation studies were conducted to assess the effects of the proposed methods on the quality of CD-CAT for fixed- and variable-length exams. The results show that, compared with the traditional PWKL method, the proposed methods preserve a lower risk of running out of time while ensuring satisfactory attribute estimation and providing more accurate estimates of the latent trait and speed parameters. Finally, several suggestions for future research are proposed.  相似文献   

4.
This paper demonstrates that the widely available and routinely used index ‘coefficient alpha if item deleted’ can be misleading in the process of construction and revision of multiple‐component instruments with congeneric measures. An alternative approach to evaluation of scale reliability following deletion of each component in a given composite is outlined that can be recommended in general for scale development purposes. The method provides ranges of plausible values for instrument reliability when dispensing with single components in a tentative composite, and permits testing hypotheses about reliability of resulting scale versions. The proposed procedure is illustrated with an example.  相似文献   

5.
郭磊  刘伟 《心理科学》2018,(1):189-195
Zhang(2013)提出了序贯监测程序(SMP)用以检测CAT中的题目在作答过程中是否发生泄漏。然而,该方法会出现虚报且未关注在题目泄漏后,对能力估计精度产生的影响。本研究在SMP基础上引入个人拟合指标,提出SMP_PFI方法,拟在给定的置信度上核实被SMP标记的题目是否真正泄漏,并探查SMP_PFI方法对能力估计精度与被封存题目数量关系的影响。实验结果表明:新方法能够有效降低SMP单独运行时的一类错误。通过控制CPFI值能够平衡能力估计精度与被封存题目数量之间的关系。  相似文献   

6.
Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.  相似文献   

7.
8.
Computerized adaptive testing (CAT) was originally proposed to measure θ, usually a latent trait, with greater precision by sequentially selecting items according to the student’s responses to previously administered items. Although the application of CAT is promising for many educational testing programs, most of the current CAT systems were not designed to provide diagnostic information. This article discusses item selection strategies specifically tailored for cognitive diagnostic tests. Our goal is to identify an effective item selection algorithm that not only estimates θ efficiently, but also classifies the student’s knowledge status α accurately. A single-stage item selection method with a dual purpose will be introduced. The main idea is to treat diagnostic criteria as constraints: Using the maximum priority index method to meet these constraints, the CAT system is able to generate cognitive diagnostic feedback in a fairly straightforward fashion. Different priority functions are proposed. Some of them are based on certain information measures, such as Kullback–Leibler information, and others utilize only the information provided by the Q-matrix. An extensive simulation study is conducted, and the results indicate that the information-based method not only yields higher classification rates for cognitive diagnosis, but also achieves more accurate θ estimation. Other constraint controls, such as item exposure rates, are also considered for all the competing methods.  相似文献   

9.
The supplemented EM (SEM) algorithm is applied to address two goodness‐of‐fit testing problems in psychometrics. The first problem involves computing the information matrix for item parameters in item response theory models. This matrix is important for limited‐information goodness‐of‐fit testing and it is also used to compute standard errors for the item parameter estimates. For the second problem, it is shown that the SEM algorithm provides a convenient computational procedure that leads to an asymptotically chi‐squared goodness‐of‐fit statistic for the ‘two‐stage EM’ procedure of fitting covariance structure models in the presence of missing data. Both simulated and real data are used to illustrate the proposed procedures.  相似文献   

10.
毛秀珍  辛涛 《心理学报》2014,46(12):1910-1922
项目曝光控制和内容约束关系到测验安全、测验的信度和效度, 是计算机化自适应测验(Computerized Adaptive Testing, CAT)中两类重要的非统计约束条件。本文在认知诊断CAT中针对内容约束和项目曝光控制要求, 运用5种方法选择测验项目。它们分别是:(1) Monte Carlo方法与项目合格方法相结合, 记为MC-IE; (2) Monte Carlo方法与最大优先指标方法相结合, 记为MC-MPI; (3) Monte Carlo方法与限制阈值方法相结合, 记为MC-RT; (4) Monte Carlo方法与限制进度指标方法相结合, 记为MC-RPG以及(5) Monte Carlo方法与最大后验概率方法相结合, 记为MC-PP。然后通过在线性、收敛、发散、无结构和独立五种属性结构下构建题库并运用重参化融融统和模型模拟被试反应比较它们的选题表现。研究发现, (1) 相同选题方法在不同属性结构下项目曝光率的分布类似, 测量精度按线性、收敛、发散、无结构和独立结构的顺序依次降低; (2) 相同属性结构下, 不同方法的测量精度高低依次为MC-PP、MC-IE、MC-RT、MC-MPI和MC-RPG方法; 项目曝光均匀性优劣依次为MC-RPG、MC-MPI、MC-RT、MC-IE和MC-PP方法。统一量纲值表明, MC-RPG方法的综合表现最好, MC-MPI方法的表现次之。  相似文献   

11.
汪文义  丁树良 《心理科学》2012,35(2):452-456
目前已有研究证明可达阵在认知诊断测验编制中起重要作用,但迄今为止并没有引起普遍注意。本文主要讨论当题库缺少某些可达阵对应的项目类,对原始题的属性向量在线标定的准确性的影响。本文对含6个属性的独立型结构进行了模拟试验,结果显示:如果题库不充要,原始题的属性标定准确性受到影响,题库中非可达阵中项目对标定有一定的弥补作用。间接印证了可达阵在认知诊断题库起到非常重要的作用。  相似文献   

12.
Content balancing is one of the most important issues in computerized classification testing. To adapt to variable-length forms, special treatments are needed to successfully control content constraints without knowledge of test length during the test. To this end, we propose the notions of ‘look-ahead’ and ‘step size’ to adaptively control content constraints in each item selection step. The step size gives a prediction of the number of items to be selected at the current stage, that is, how far we will look ahead. Two look-ahead content balancing (LA-CB) methods, one with a constant step size and another with an adaptive step size, are proposed as feasible solutions to balancing content areas in variable-length computerized classification testing. The proposed LA-CB methods are compared with conventional item selection methods in variable-length tests and are examined with different classification methods. Simulation results show that, integrated with heuristic item selection methods, the proposed LA-CB methods result in fewer constraint violations and can maintain higher classification accuracy. In addition, the LA-CB method with an adaptive step size outperforms that with a constant step size in content management. Furthermore, the LA-CB methods generate higher test efficiency while using the sequential probability ratio test classification method.  相似文献   

13.
Owen (1975) proposed an approximate empirical Bayes procedure for item selection in computerized adaptive testing (CAT). The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational complexity involved in a fully Bayesian approach but is no longer necessary given the computational power currently available for adaptive testing. This paper suggests several item selection criteria for adaptive testing which are all based on the use of the true posterior. Some of the statistical properties of the ability estimator produced by these criteria are discussed and empirically characterized.Portions of this paper were presented at the 60th annual meeting of the Psychometric Society, Minneapolis, Minnesota, June, 1995. The author is indebted to Wim M. M. Tielen for his computational support.  相似文献   

14.
15.
Knowles ES  Condon CA 《心理评价》2000,12(3):245-252
This article examines item stability when the same item appears in different contexts. The 1st section considers the assumptions in classical test theory and item response theory concerning the relationship between the item and the trait it is presumed to measure. The 2nd section presents contextualist challenges to the measurement theory assumptions about item properties and shows the instability of item characteristics across different testing contexts. The 3rd section describes methods for checking the relationship between items and traits. Classical test methods, item response methods, and structural equation methods for assessing item stability are reviewed. The instability of item characteristics across contexts should caution researchers to assess, and not assume, that items operate the same way on different test versions. Item instability also indicates the need for a more detailed understanding of the psychological processes that occur between item and answer.  相似文献   

16.
Item compromise persists in undermining the integrity of testing, even secure administrations of computerized adaptive testing (CAT) with sophisticated item exposure controls. In ongoing efforts to tackle this perennial security issue in CAT, a couple of recent studies investigated sequential procedures for detecting compromised items, in which a significant increase in the proportion of correct responses for each item in the pool is monitored in real time using moving averages. In addition to actual responses, response times are valuable information with tremendous potential to reveal items that may have been leaked. Specifically, examinees that have preknowledge of an item would likely respond more quickly to it than those who do not. Therefore, the current study proposes several augmented methods for the detection of compromised items, all involving simultaneous monitoring of changes in both the proportion correct and average response time for every item using various moving average strategies. Simulation results with an operational item pool indicate that, compared to the analysis of responses alone, utilizing response times can afford marked improvements in detection power with fewer false positives.  相似文献   

17.
CD–CAT中已有选题策略较注重测验效率,而对题库使用率不够重视。针对此问题,基于DINA模型,引入两种新的选题策略KLED和RHA,同时对HA进行模拟研究。结果显示:PWKL与KLED只在测验效率上具有优势;KLED若按属性向量分层,题库使用率有所提高,KLED比ED更容易推广到其他有显式表达的诊断模型场合;HA、RHA和RP–PWKL可较好兼顾测验效度和题库使用率,但RP-PWKL需设置项目的最大曝光率阈值。两种新选题方法在定长和变长CD-CAT都具有一定的应用价值。  相似文献   

18.
目前参数估计多采用统计方法,存在耗时长、要求被试样本容量大和项目数多等缺点。本文将BP神经网络和降维法相结合,对GRM的项目参数和考生能力参数进行估计。蒙特卡洛模拟结果显示:(1)不管是人多题少还是题多人少,该网络设计下的参数估计精度都较高;(2)可以应用到多个不同等级评分的参数估计中,甚至是超过15个等级的项目参数,估计精度也较高,这是其他参数估计方法所不可比拟的;(3)运行的时长和统计估计方法相比大大缩减。  相似文献   

19.
CD-CAT是CDA同CAT的相结合的产物,适用于课堂教学,是教师补救教学、学生自我学习的重要工具。作为CD-CAT重要组成部分的初始阶段项目选取方法是影响测验判准率的重要因素。本文基于现有研究和CDA的项目区分度提出了四种新的初始阶段项目选取方法:CTTID法、CDI法、CTTIDR*法和CDIR*法。通过模拟研究发现,在定长的CD-CAT下,题库质量是HD-HV下,初始阶段结束时,CTTIDR*法的PCCR比现有的T阵法高了.2999,比PWKL高了.1707,其它题库下趋势相同。整个测验结束时CTTIDR*法的判准率仍然是最高的。在变长的CD-CAT下,最大后验概率大于.7、.8、.9下,CTTIDR*法的被试平均测验长度比T阵法分别缩短了2.6170、2.2347、1.7470道题。  相似文献   

20.
计算机化自适应测验选题策略述评   总被引:2,自引:0,他引:2  
毛秀珍  辛涛 《心理科学进展》2011,19(10):1552-1562
计算机化自适应测验(computerized adaptive testing, CAT)是基于测量理论和计算机技术的一种测验模式。它根据考生的作答反应自适应地选择测验项目。选题策略是CAT的重要组成部分之一, 关系到测量效率、测验安全和测验信、效度等重要问题。根据CAT是否具有非统计约束对传统CAT和认知诊断CAT的选题策略进行了分类介绍, 未来研究应进一步提高选题策略的综合表现、深入探讨多级评分项目和认知诊断CAT的选题策略。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号