首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Answer similarity indices were developed to detect pairs of test takers who may have worked together on an exam or instances in which one test taker copied from another. For any pair of test takers, an answer similarity index can be used to estimate the probability that the pair would exhibit the observed response similarity or a greater degree of similarity under the assumption that the test takers worked independently. To identify groups of test takers with unusually similar response patterns, Wollack and Maynes suggested conducting cluster analysis using probabilities obtained from an answer similarity index as measures of distance. However, interpretation of results at the cluster level can be challenging because the method is sensitive to the choice of clustering procedure and only enables probabilistic statements about pairwise relationships. This article addresses these challenges by presenting a statistical test that can be applied to clusters of examinees rather than pairs. The method is illustrated with both simulated and real data.  相似文献   

2.
3.
This study proposes a new item parameter linking method for the common-item nonequivalent groups design in item response theory (IRT). Previous studies assumed that examinees are randomly assigned to either test form. However, examinees can frequently select their own test forms and tests often differ according to examinees’ abilities. In such cases, concurrent calibration or multiple group IRT modeling without modeling test form selection behavior can yield severely biased results. We proposed a model wherein test form selection behavior depends on test scores and used a Monte Carlo expectation maximization (MCEM) algorithm. This method provided adequate estimates of testing parameters.  相似文献   

4.
Computerized classification testing (CCT) commonly chooses items maximizing information at the cut score, which yields the most information for decision-making. However, a corollary problem is that all examinees will be given the same set of items, resulting in high test overlap rate and unbalanced item bank usage, which threatens test security. Moreover, another pivotal issue for CCT is time control. Since both the extremely long response time (RT) and large RT variability across examinees intensify time-induced anxiety, it is crucial to reduce the number of examinees exceeding the time limitation and the differences between examinees' test-taking times. To satisfy these practical needs, this paper proposes the novel idea of stage adaptiveness to tailor the item selection process to the decision-making requirement in each step and generate fresh insight into the existing response time selection method. Results indicate that a balanced item usage as well as short and stable test times across examinees can be achieved via the new methods.  相似文献   

5.
Methods of cognitive diagnostic computerized adaptive testing (CD-CAT) under higher-order cognitive diagnosis models have been developed to simultaneously provide estimates of the attribute mastery statuses of examinees for formative assessment and estimates of a latent continuous trait for overall summative evaluation. In a typical CD-CAT environment, examinees are often subject to a time limit, and the examinees’ response times (RTs) for specific test items can be routinely recorded by custom-made programs. Because examinees are individually administered tailored sets of test items from the item pool, they may experience different levels of speededness during testing and different levels of risk of running out of time. In this study, RTs were considered during the item-selection procedure to control the test speededness and the RTs were treated as useful information for improving latent trait estimation in CD-CAT under the higher-order deterministic input, noisy ‘and’ gate (DINA) model. A modified posterior-weighted Kullback–Leibler (PWKL) method that maximizes the item information per time unit and a shadow-test method that assembles a provisional test subject to a specified time constraint were developed. Two simulation studies were conducted to assess the effects of the proposed methods on the quality of CD-CAT for fixed- and variable-length exams. The results show that, compared with the traditional PWKL method, the proposed methods preserve a lower risk of running out of time while ensuring satisfactory attribute estimation and providing more accurate estimates of the latent trait and speed parameters. Finally, several suggestions for future research are proposed.  相似文献   

6.
计算机形式的测验能够记录考生在测验中的题目作答时间(Response Time, RT),作为一种重要的辅助信息来源,RT对于测验开发和管理具有重要的价值,特别是在计算机化自适应测验(Computerized Adaptive Testing, CAT)领域。本文简要介绍了RT在CAT选题方面应用并作以简评,分析了这些技术在实践中的可行性。最后,探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。  相似文献   

7.
汪文义  丁树良  宋丽红 《心理学报》2015,47(12):1499-1510
分类是认知诊断评估的一个核心问题。基于观察反应模式与理想反应模式之间的距离的判别方法, 以确定性的理想反应模式为类中心, 而这没有考虑误差, 故未充分利用总体分布信息。为了更充分地利用总体分布信息、提高诊断分类效果和拓展诊断评估的适用性, 本研究提出给定知识状态条件下项目反应模式的条件期望向量为类中心的欧氏距离判别方法, 同时提出认知诊断模型下项目反应函数估计方法以获得这个条件期望向量。模拟研究表明:认知诊断模型下的项目反应函数估计方法得到的条件期望向量返真性较高, 获得的分布信息较准确; 在观察反应模式与理想反应模式差异大的情形下, 基于条件期望向量为类中心的欧氏距离判别方法优于基于理想反应模式为类中心的分类方法(广义距离方法和非参数方法)。研究可为认知诊断分类和等值方法提供一个参考。  相似文献   

8.
The use of computer-based assessments makes the collection of detailed data that capture examinees’ progress in the tests and time spent on individual actions possible. This article presents a study using process and timing data to aid understanding of an international language assessment and the examinees. Issues regarding test-taking strategies, test speededness, test design, and their relationship to examinees’ demographic backgrounds and performance are also discussed.  相似文献   

9.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。  相似文献   

10.
A speeded item response model is proposed. We consider the situation where examinees may retain the harder items to a later test period in a time limit test. With such a strategy, examinees may not finish answering some of the harder items within the allocated time. In the proposed model, we try to describe such a mechanism by incorporating a speeded-effect term into the two-parameter logistic item response model. A Bayesian estimation procedure of the current model using Markov chain Monte Carlo is presented, and its performance over the two-parameter logistic item response model in a speeded test is demonstrated through simulations. The methodology is applied to physics examination data of the Department Required Test for college entrance in Taiwan for illustration.  相似文献   

11.
The purpose of the present studies was to evaluate and predict academic cheating with regard to a national examination in a Middle East country. In Study 1, 4,024 students took part and potential cheaters were classified as those having discrepant scores in multiple administrations that exceeded 1 SD in absolute terms. A latent class mixture analysis suggested two pathways for potential cheating: (a) The first path involved students—most male—who changed city or region of examination during test taking, and (b) the second path described students—most male—who did not change city, region, or center of administration. Study 2 profiled cheaters using a sample of examinees who were actually caught cheating. Participants were 545 students, 253 of whom were caught cheating between 2002 and 2012. Both samples were selected from a pool of 319,219 testees using random sampling procedures. Results indicated that a 4-class solution best fitted the data as in Study 1. Furthermore, a predictive model was tested with an independent cross-validation sample of 112 examinees (56 cheaters, 56 noncheaters). Results indicated that the model classified correctly 78.57 of the new cheating cases (sensitivity) and 94.64% of noncheaters (specificity).  相似文献   

12.
正如不同的病症需要使用不同的医疗技术方法来诊断一样, 不同的认知结构也需要设计对应的测验模式来进行诊断, 从而保证测验具有高质量的诊断评估效果。但传统测验形式未考虑不同认知结构的针对性诊断测验需求, 导致“千人一卷”在测验效率上有所不足; 认知诊断计算机化自适应测验虽可针对不同认知结构的被试施测不同的项目, 然而支持自适应过程的题库却没有针对不同认知结构被试设计对应的项目, 导致题库使用效率较低。要解决上述问题的关键在于, 探索如何针对不同认知结构设计相对应的测验模式。本研究采用Monte Carlo模拟, 对六种属性层级关系下, 不同认知结构的测验设计模式进行探讨。实验结果表明(1)同一属性层级关系下, 不同认知结构的最佳测验设计模式不同; (2)依据不同认知结构的最佳测验设计模式构建的题库具有更高的使用效率。测验编制者可以根据实验结果针对不同认知结构优化对应的测验设计模式, 并用于指导题库建设。  相似文献   

13.
Throughout the world, tests are administered to some examinees who are not fully proficient in the language in which they are being tested. It has long been acknowledged that proficiency in the language in which a test is administered often affects examinees’ performance on a test. Depending on the context and intended uses for a particular assessment, linguistic proficiency may be relevant to the tested construct and subsequent interpretations, or may be a source of construct-irrelevant variance that undermines accurate interpretation of the test performance of linguistic minorities who are not proficient in the language of the assessment. In this article, we highlight key validity issues to be considered when testing linguistic minorities, regardless of whether language is central or construct-irrelevant. We discuss examples of the different types of studies test users and developers could conduct to evaluate the validity of scores of linguistic minorities. These issues span test development and validation activities. We conclude with a list of critical factors to consider in test development and evaluation whenever linguistic minorities are tested.  相似文献   

14.
Can Shao  Jun Li  Ying Cheng 《Psychometrika》2016,81(4):1118-1141
Change-point analysis (CPA) is a well-established statistical method to detect abrupt changes, if any, in a sequence of data. In this paper, we propose a procedure based on CPA to detect test speededness. This procedure is not only able to classify examinees into speeded and non-speeded groups, but also identify the point at which an examinee starts to speed. Identification of the change point can be very useful. First, it informs decision makers of the appropriate length of a test. Second, by removing the speeded responses, instead of the entire response sequence of an examinee suspected of speededness, ability estimation can be improved. Simulation studies show that this procedure is efficient in detecting both speeded examinees and the speeding point. Ability estimation is dramatically improved by removing speeded responses identified by our procedure. The procedure is then applied to a real dataset for illustration purpose.  相似文献   

15.
In low-stakes assessments, test performance has few or no consequences for examinees themselves, so that examinees may not be fully engaged when answering the items. Instead of engaging in solution behaviour, disengaged examinees might randomly guess or generate no response at all. When ignored, examinee disengagement poses a severe threat to the validity of results obtained from low-stakes assessments. Statistical modelling approaches in educational measurement have been proposed that account for non-response or for guessing, but do not consider both types of disengaged behaviour simultaneously. We bring together research on modelling examinee engagement and research on missing values and present a hierarchical latent response model for identifying and modelling the processes associated with examinee disengagement jointly with the processes associated with engaged responses. To that end, we employ a mixture model that identifies disengagement at the item-by-examinee level by assuming different data-generating processes underlying item responses and omissions, respectively, as well as response times associated with engaged and disengaged behaviour. By modelling examinee engagement with a latent response framework, the model allows assessing how examinee engagement relates to ability and speed as well as to identify items that are likely to evoke disengaged test-taking behaviour. An illustration of the model by means of an application to real data is presented.  相似文献   

16.
丁树良  罗芬  戴海琦  朱玮 《心理学报》2007,39(4):730-736
在IRT框架下,建立了0-1评分方式下单维双参数Logistic多题多做(MAMI)测验模型。与Spray给出的一题多做(MASI)模型相比,MAMI不仅模型更加精致,而且扩展了适用范围,参数估计方法也不同,采用EM算法求取项目参数。Monte Carlo模拟结果显示,应用MAMI测验模型与测验题量作相应增加的作法相比,两者给出的能力估计精度相同,但MAMI模型给出的项目参数估计精度更高。如果将MAMI测验模型与被试人数相应增加的作法相比,项目参数的估计精度相同,但MAMI给出的能力参数估计精度更高。这个发现表明,在一定条件下若允许修改答案,并采用累加式记分方式,纵使题量不变,也可使能力估计的精度相当于题量增加一倍的估计精度,而项目参数估计精度也会提高。这些发现不仅对技能评价和认知能力评价有参考价值,而且对数据的处理方式也有参考价值  相似文献   

17.
In item response theory (IRT), the invariance property states that item parameter estimates are independent of the examinee sample, and examinee ability estimates are independent of the test items. While this property has long been established and understood by the measurement community for IRT models, the same cannot be said for diagnostic classification models (DCMs). DCMs are a newer class of psychometric models that are designed to classify examinees according to levels of categorical latent traits. We examined the invariance property for general DCMs using the log-linear cognitive diagnosis model (LCDM) framework. We conducted a simulation study to examine the degree to which theoretical invariance of LCDM classifications and item parameter estimates can be observed under various sample and test characteristics. Results illustrated that LCDM classifications and item parameter estimates show clear invariance when adequate model data fit is present. To demonstrate the implications of this important property, we conducted additional analyses to show that using pre-calibrated tests to classify examinees provided consistent classifications across calibration samples with varying mastery profile distributions and across tests with varying difficulties.  相似文献   

18.
To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed‐length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two.  相似文献   

19.
Cognitive diagnosis models of educational test performance rely on a binary Q‐matrix that specifies the associations between individual test items and the cognitive attributes (skills) required to answer those items correctly. Current methods for fitting cognitive diagnosis models to educational test data and assigning examinees to proficiency classes are based on parametric estimation methods such as expectation maximization (EM) and Markov chain Monte Carlo (MCMC) that frequently encounter difficulties in practical applications. In response to these difficulties, non‐parametric classification techniques (cluster analysis) have been proposed as heuristic alternatives to parametric procedures. These non‐parametric classification techniques first aggregate each examinee's test item scores into a profile of attribute sum scores, which then serve as the basis for clustering examinees into proficiency classes. Like the parametric procedures, the non‐parametric classification techniques require that the Q‐matrix underlying a given test be known. Unfortunately, in practice, the Q‐matrix for most tests is not known and must be estimated to specify the associations between items and attributes, risking a misspecified Q‐matrix that may then result in the incorrect classification of examinees. This paper demonstrates that clustering examinees into proficiency classes based on their item scores rather than on their attribute sum‐score profiles does not require knowledge of the Q‐matrix, and results in a more accurate classification of examinees.  相似文献   

20.
Examinees who take credentialing tests and other types of high-stakes assessments are usually provided an opportunity to repeat the test if they are unsuccessful on initial attempts. To prevent examinees from obtaining unfair score increases by memorizing the content of specific test items, testing agencies usually assign an alternate form to repeat examinees. Given that the use of multiple forms presents both practical and psychometric challenges, it is important to determine if unwarranted score gains occur. Most research indicates that repeat examinees realize score gains when taking the same form twice; however, the research is far from conclusive, particularly within the context of credentialing. For the present investigations, two samples of repeat examinees were randomly assigned to receive either the same test form or a different, but parallel, form on the second occasion. Study 1 found score gains of about 0.79 SD units for 71 examinees who repeated a certification examination in computed tomography. Study 2 found gains of 0.48 SD units for 765 examinees who repeated a radiography certification examination. In both studies score gains for examinees receiving the parallel test were nearly indistinguishable from score gains for those who received the same test. Factors are identified that may influence the generalizability of these findings to other assessment contexts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号