首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
郭磊  王卓然  王丰  边玉芳 《心理学报》2014,46(5):702-713
测验安全和题库使用率在计算机化自适应测验中十分重要, 特别是高风险测验。传统的SHGT法兼具同时控制项目曝光率和广义测验重叠率的功能, 但题库使用率较差。a分层法能够提高题库使用率, 但对过度曝光的项目控制不足。本研究将a分层法的思想与SHGT法相结合, 各取所长, 提出了3种新的选题方法:SHGT_a法, SHGT_b法和SHGT_c法。研究结果表明:(1)与SHGT法相比, 新方法均可以在有效地控制项目曝光率和广义测验重叠率同时, 极大地提高题库使用率; (2)随着预设项目曝光率(rmax)和广义测验重叠率( )取值的增大以及共享人数a的减小, 新方法对被试能力估计的精度呈上升趋势。比起SHGT法, 新方法仍能保持很高的题库使用率; (3)当区分度和难度的相关(rab)较大时, SHGT_b和SHGT_c法在能力估计精度方面优于SHGT_a法; (4)在不同的测验考察内容比例下, 3种新方法对被试能力估计的精度均较好; (5)与SHGT法相比, 新方法能够有效地控制项目曝光率过度控制的问题。  相似文献   

2.
The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) – a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge‐based solution for finding item exposure parameters.  相似文献   

3.
To date, exposure control procedures that are designed to control item exposure and test overlap simultaneously are based on the assumption of item sharing between pairs of examinees. However, examinees may obtain test information from more than one examinee in practice. This larger scope of information sharing needs to be taken into account in refining exposure control procedures. To control item exposure and test overlap among a group of examinees larger than two, the relationship between the two indices needs to be identified first. The purpose of this paper is to analytically derive the relationships between item exposure rate and each of the two forms of test overlap, item sharing and item pooling, for fixed‐length computerized adaptive tests. Item sharing is defined as the number of common items shared by all examinees in a group, while item pooling is the number of overlapping items that an examinee has with a group of examinees. The accuracy of the derived relationships was verified using numerical examples. The relationships derived will lay the foundation for future development of procedures to simultaneously control item exposure and item sharing or item pooling among a group of examinees larger than two.  相似文献   

4.
CD–CAT中已有选题策略较注重测验效率,而对题库使用率不够重视。针对此问题,基于DINA模型,引入两种新的选题策略KLED和RHA,同时对HA进行模拟研究。结果显示:PWKL与KLED只在测验效率上具有优势;KLED若按属性向量分层,题库使用率有所提高,KLED比ED更容易推广到其他有显式表达的诊断模型场合;HA、RHA和RP–PWKL可较好兼顾测验效度和题库使用率,但RP-PWKL需设置项目的最大曝光率阈值。两种新选题方法在定长和变长CD-CAT都具有一定的应用价值。  相似文献   

5.
Content balancing is often required in the development and implementation of computerized adaptive tests (CATs). In the current study, we propose a modified a‐stratified method, the a‐stratified method with content blocking. As a further refinement of a‐stratified CAT designs, the new method incorporates content specifications into item pool stratification. Simulation studies were conducted to compare the new method with three previous item selection methods: the a‐stratified method; the a‐stratified with b‐blocking method; and the maximum Fisher information method with Sympson‐Hetter exposure control. The results indicated that the refined a‐stratified design performed well in reducing item overexposure rates, balancing item usage within the pool, and maintaining measurement precision, in a situation where all four procedures were forced to balance content.  相似文献   

6.
针对双目标CD-CAT,将六种项目区分度(鉴别力D、一般区分度GDI、优势比OR、2PL的区分度a、属性区分度ADI、认知诊断区分度CDI)分别与IPA方法结合,得到新的选题策略。模拟研究比较了它们的表现,还考察了区分度分层在控制项目曝光的表现。结果发现:新方法都能明显提高知识状态的判准率和能力估计精度;分层选题均能很好地提高题库利用率。总体上,OR加权能显著提高测量精度;OR分层选题在保证测量精度条件下显著提高项目曝光均匀性。  相似文献   

7.
8.
与传统的纸笔测验(Paper And Pencil Based Test, P&P)相比计算机化自适应测验(Computerized Adaptive Testing, CAT)根据被试的作答反应自适应地选择题目, 它不仅缩短了测验长度, 还极大地提高了测验的准确性。然而, 目前绝大多数CAT不允许被试修改答案, 研究者主要担心修改答案会降低CAT的有效性。允许修改答案符合被试一贯的测验习惯, 修改之后的分数更能反映被试真实的水平, 从而能够进一步促进CAT在实际中的应用。现有的研究主要从三个方面提出了可修改答案CAT的控制方法:一是测验设计; 二是改进选题策略; 三是建构模型。未来的研究应进一步探讨这些方法之间的比较与结合, 以及对可修改答案认知诊断CAT (Cognitive Diagnostic CAT, CD-CAT)的研究。  相似文献   

9.
陈平  李珍  辛涛 《心理与行为研究》2011,9(2):125-132,153
项目曝光控制是认知诊断计算机化自适应测验(CD-CAT)中亟需解决的重要问题之一。采用蒙特卡洛模拟方法对CD-CAT中五种常用选题策略(随机化方法、KL信息量方法、香农熵方法、后验加权的KL信息量方法和综合后验加权和距离加权的KL信息量方法)的题库使用情况进行探讨。结果发现:四种非随机化选题策略的题库使用均匀性较差、测验重叠率高,从而导致测验安全性较差;香农熵方法的判准率总是最高。今后可以将传统CAT中的项目曝光控制技术融入到CD-CAT选题策略中。  相似文献   

10.
Wendy M. Yen 《Psychometrika》1987,52(2):275-291
Comparisons are made between BILOG version 2.2 and LOGIST 5.0 Version 2.5 in estimating the item parameters, traits, item characteristic functions (ICFs), and test characteristic functions (TCFs) for the three-parameter logistic model. Data analyzed are simulated item responses for 1000 simulees and one 10-item test, four 20-item tests, and four 40-item tests. LOGIST usually was faster than BILOG in producing maximum likelihood estimates. BILOG almost always produced more accurate estimates of individual item parameters. In estimating ICFs and TCFs BILOG was more accurate for the 10-item test, and the two programs were about equally accurate for the 20- and 40-item tests.I am grateful to Robert J. Mislevy, Martha L. Stocking, and Marilyn S. Wingersky for many helpful comments on an earlier version of this paper. I would also like to thank Hamid Kamrani and Bongmyoung Park for getting LOGIST and BILOG running and keeping them running under changing computer systems at CTB/McGraw-Hill.  相似文献   

11.
The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item‐exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods ( Revuelta & Ponsoda, 1998 ) and the proportional method ( Segall, 2004a ), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved.  相似文献   

12.
提出了两种适用于定长CD-CAT的题目曝光控制方法(HIRP、HIRT),这些方法在保证较高分类准确率的同时还有较合理的题目曝光率,新方法由二分化方法和RP及RT方法进行结合并适当调整而得到。模拟研究比较了其与RP、RT、SM、SMIE、RHA和SDBS的表现,结果表明: (1)HIRP的分类准确率和题目曝光率均好于SM、SMIE和SDBS;(2)HIRT的题目曝光率较RP、SM、SMIE、RHA和SDBS稍差,但分类准确率更高;(3)HIRP的分类准确率低于RT和RP,但题目曝光控制要更好。  相似文献   

13.
In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (θ) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item‐overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non‐statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS‐MI), integrating the MS and MI approaches, in testing with non‐statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS–MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS–MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed.  相似文献   

14.
Marginal maximum‐likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first‐level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.  相似文献   

15.
The semi‐parametric proportional hazards model with crossed random effects has two important characteristics: it avoids explicit specification of the response time distribution by using semi‐parametric models, and it captures heterogeneity that is due to subjects and items. The proposed model has a proportionality parameter for the speed of each test taker, for the time intensity of each item, and for subject or item characteristics of interest. It is shown how all these parameters can be estimated by Markov chain Monte Carlo methods (Gibbs sampling). The performance of the estimation procedure is assessed with simulations and the model is further illustrated with the analysis of response times from a visual recognition task.  相似文献   

16.
Random item effects models provide a natural framework for the exploration of violations of measurement invariance without the need for anchor items. Within the random item effects modelling framework, Bayesian tests (Bayes factor, deviance information criterion) are proposed which enable multiple marginal invariance hypotheses to be tested simultaneously. The performance of the tests is evaluated with a simulation study which shows that the tests have high power and low Type I error rate. Data from the European Social Survey are used to test for measurement invariance of attitude towards immigrant items and to show that background information can be used to explain cross‐national variation in item functioning.  相似文献   

17.
毛秀珍  辛涛 《心理学报》2014,46(12):1910-1922
项目曝光控制和内容约束关系到测验安全、测验的信度和效度, 是计算机化自适应测验(Computerized Adaptive Testing, CAT)中两类重要的非统计约束条件。本文在认知诊断CAT中针对内容约束和项目曝光控制要求, 运用5种方法选择测验项目。它们分别是:(1) Monte Carlo方法与项目合格方法相结合, 记为MC-IE; (2) Monte Carlo方法与最大优先指标方法相结合, 记为MC-MPI; (3) Monte Carlo方法与限制阈值方法相结合, 记为MC-RT; (4) Monte Carlo方法与限制进度指标方法相结合, 记为MC-RPG以及(5) Monte Carlo方法与最大后验概率方法相结合, 记为MC-PP。然后通过在线性、收敛、发散、无结构和独立五种属性结构下构建题库并运用重参化融融统和模型模拟被试反应比较它们的选题表现。研究发现, (1) 相同选题方法在不同属性结构下项目曝光率的分布类似, 测量精度按线性、收敛、发散、无结构和独立结构的顺序依次降低; (2) 相同属性结构下, 不同方法的测量精度高低依次为MC-PP、MC-IE、MC-RT、MC-MPI和MC-RPG方法; 项目曝光均匀性优劣依次为MC-RPG、MC-MPI、MC-RT、MC-IE和MC-PP方法。统一量纲值表明, MC-RPG方法的综合表现最好, MC-MPI方法的表现次之。  相似文献   

18.
Using an item‐response theory‐based approach (i.e. likelihood ratio test with an iterative procedure), we examined the equivalence of the Rosenberg Self‐Esteem Scale (RSES) in a sample of US and Chinese college students. Results from the differential item functioning (DIF) analysis showed that the RSES was not fully equivalent at the item level, as well as at the scale level. The two cultural groups did not use the scale comparably, with the US students showing more extreme responses than the Chinese students. Moreover, we evaluated the practical impact of DIF and found that cultural differences in average self‐esteem scores disappeared after the DIF was taken into account. In the present study, we discuss the implications of our findings for cross‐cultural research and provide suggestions for future studies using the RSES in China.  相似文献   

19.
20.
The response time‐based concealed information test can reveal when a person recognizes a relevant item among other, irrelevant items, based on comparatively slower responding. Thereby, if a person is concealing the knowledge about the relevance of this item (e.g., recognizing it as a murder weapon), this deception can be revealed. A recent study, conducted online and using a between‐subject design, introduced a significantly enhanced version by including additional items in the task. While this modified version outperformed the original version, it also resulted in a much higher rate of participant dropouts (i.e., participants leaving the experiment's website without completing the task). The grave implication is that the perceived enhancement is perhaps merely due to selective attrition. Therefore, the current experiment replicates the original one, but using a within‐subject design. The results show that there is a large enhancement even when selective attrition is prevented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号