首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A pplications of standard item response theory models assume local independence of items and persons. This paper presents polytomous multilevel testlet models for dual dependence due to item and person clustering in testlet‐based assessments with clustered samples. Simulation and survey data were analysed with a multilevel partial credit testlet model. This model was compared with three alternative models – a testlet partial credit model (PCM), multilevel PCM, and PCM – in terms of model parameter estimation. The results indicated that the deviance information criterion was the fit index that always correctly identified the true multilevel testlet model based on the quantified evidence in model selection, while the Akaike and Bayesian information criteria could not identify the true model. In general, the estimation model and the magnitude of item and person clustering impacted the estimation accuracy of ability parameters, while only the estimation model and the magnitude of item clustering affected the item parameter estimation accuracy. Furthermore, ignoring item clustering effects produced higher total errors in item parameter estimates but did not have much impact on the accuracy of ability parameter estimates, while ignoring person clustering effects yielded higher total errors in ability parameter estimates but did not have much effect on the accuracy of item parameter estimates. When both clustering effects were ignored in the PCM, item and ability parameter estimation accuracy was reduced.  相似文献   

2.
题组作为众多测验中的一种常见题型,由于项目间存在一定程度的依赖性而违背了局部独立性假设,若用项目反应模型进行参数估计将会出现较大的偏差.题组反应理论将被试与题组的交互作用纳入到模型中,解决了项目间相依性的问题.笔者对题组反应理论的发展、基本原理及其相关研究进行了综述,并将其应用在中学英语考试中.与项目反应理论相对比,结果发现:(1)题组反应模型与项目反应模型在各参数估计值的相关系数较强,尤其是能力参数和难度参数;(2)在置信区间宽度的比较上,题组反应模型在各个参数上均窄于项目反应模型,即题组反应模型的估计精度优于项目反应模型.  相似文献   

3.
马洁  刘红云 《心理科学》2018,(6):1374-1381
本研究通过高中英语阅读测验实测数据,对比分析双参数逻辑斯蒂克模型 (2PL-IRT)和加入不同数量题组的双参数逻辑斯蒂克模型 (2PL-TRT), 探究题组数量对参数估计及模型拟合的影响。结果表明:(1) 2PL-IRT模型对能力介于-1.50到0.50的被试,能力参数估计偏差较大;(2)将题组效应大于0.50的题组作为局部独立题目纳入模型,会导致部分题目区分度参数的低估和大部分题目难度参数的高估;(3)题组效应越大,将其当作局部独立题目纳入模型估计项目参数的偏差越大。  相似文献   

4.
刘玥  刘红云 《心理学报》2012,44(2):263-275
题组模型可以解决传统IRT模型由于题目间局部独立性假设违背时所导致的参数估计偏差。为探讨题组随机效应模型的适用范围, 采用Monte Carlo模拟研究, 分别使用2-PL贝叶斯题组随机效应模型(BTRM)和2-PL贝叶斯模型(BM)对数据进行拟合, 考虑了题组效应、题组长度、题目数量和局部独立题目比例的影响。结果显示:(1) BTRM不受题组效应和题组长度影响, BM对参数估计的误差随题组效应和题组长度增加而增加。(2) BTRM具有一定的普遍性, 且当题组效应大, 题组长, 题目数量大时使用该模型能减少估计误差, 但是当题目数量较小时, 两个模型得到的能力估计误差都较大。(3)当局部独立题目的比例较大时, 两种模型得到的参数估计差异不大。  相似文献   

5.
多维题组效应Rasch模型   总被引:2,自引:0,他引:2  
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。  相似文献   

6.
This paper proposes two unidimensional item response theory (IRT) models for analysing normative forced‐choice personality items. Both models are derived from a common theoretical framework and arise as a result of different assumptions regarding the mechanism of choice. The simplest mechanism gives rise to the one‐parameter normal‐ogive model. The second mechanism gives rise to a new IRT model, which is closely related to the Coombs–Zinnes probabilistic unfolding model. The second model is compared theoretically to the normal‐ogive model in terms of item characteristic curves and amount of item information. Next, procedures for estimating the respondent and the item parameters in the second model are described. Finally, both models are empirically compared by using two well‐known personality measures.  相似文献   

7.
Examinee‐selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non‐ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two‐dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non‐ignorable and to determine how to apply the new model to the data collected. Two follow‐up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non‐ignorable missing data were mistakenly treated as ignorable.  相似文献   

8.
A Two-Tier Full-Information Item Factor Analysis Model with Applications   总被引:2,自引:0,他引:2  
Li Cai 《Psychometrika》2010,75(4):581-612
Motivated by Gibbons et al.’s (Appl. Psychol. Meas. 31:4–19, 2007) full-information maximum marginal likelihood item bifactor analysis for polytomous data, and Rijmen, Vansteelandt, and De Boeck’s (Psychometrika 73:167–182, 2008) work on constructing computationally efficient estimation algorithms for latent variable models, a two-tier item factor analysis model is developed in this research. The modeling framework subsumes standard multidimensional IRT models, bifactor IRT models, and testlet response theory models as special cases. Features of the model lead to a reduction in the dimensionality of the latent variable space, and consequently significant computational savings. An EM algorithm for full-information maximum marginal likelihood estimation is developed. Simulations and real data demonstrations confirm the accuracy and efficiency of the proposed methods. Three real data sets from a large-scale educational assessment, a longitudinal public health survey, and a scale development study measuring patient reported quality of life outcomes are analyzed as illustrations of the model’s broad range of applicability.  相似文献   

9.
Various different item response theory (IRT) models can be used in educational and psychological measurement to analyze test data. One of the major drawbacks of these models is that efficient parameter estimation can only be achieved with very large data sets. Therefore, it is often worthwhile to search for designs of the test data that in some way will optimize the parameter estimates. The results from the statistical theory on optimal design can be applied for efficient estimation of the parameters.A major problem in finding an optimal design for IRT models is that the designs are only optimal for a given set of parameters, that is, they are locally optimal. Locally optimal designs can be constructed with a sequential design procedure. In this paper minimax designs are proposed for IRT models to overcome the problem of local optimality. Minimax designs are compared to sequentially constructed designs for the two parameter logistic model and the results show that minimax design can be nearly as efficient as sequentially constructed designs.  相似文献   

10.
Abstract:  In test operations using IRT (item response theory), items are included in a test before being used to rate subjects and the response data is used to estimate their item parameters. However, this method of test operation may lead to item content leakage and an adequate test operation can become difficult. To address this problem, Ozaki and Toyoda (2005, 2006 ) developed item difficulty parameter estimation methods that use paired comparison data from the perspective of the difficulty of items as judged by raters familiar with the field. In the present paper, an improved method of item difficulty parameter estimation is developed. In this new method, an item for which the difficulty parameter is to be estimated is compared with multiple items simultaneously, from the perspective of their difficulty. This is not a one-to-one comparison but a one-to-many comparison. In the comparisons, raters are informed that items selected from an item pool are ordered according to difficulty. The order will provide insight to improve the accuracy of judgment.  相似文献   

11.
在测量具有层阶结构的潜质时, 标准项目反应模型对项目参数估计和能力参数估计都具有较低的效率, 多维项目反应模型虽然在估计第一阶潜质时具有高效性, 但没有考虑到潜质层阶的情况, 所以它不适合用来处理具有层阶结构的潜质; 而高阶项目反应模型在处理这种具有层阶结构的潜质时, 不仅能够高效准确地对项目参数和能力参数进行估计, 而且还能同时获得高阶潜质与低阶潜质。目前存在的高阶项目反应模型有高阶DINA模型、高阶双参数正态肩型层阶模型、高阶逻辑斯蒂模型、多级评分的高阶项目反应模型和高阶题组模型。未来对高阶项目反应模型的研究方向应注意多水平高阶项目反应模型、项目内多维情况下的高阶项目反应模型以及高阶认知诊断模型。  相似文献   

12.
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.  相似文献   

13.
当前认知诊断领域还缺少对包含题组的测验进行诊断分析的研究, 即已开发的认知诊断模型无法合理有效地处理含有题组效应的测验数据, 且已开发的题组反应模型也不具有对被试知识结构或认知过程进行诊断的功能。针对该问题, 本文尝试性地将多维题组效应向量参数引入线性Logistic模型中, 同时开发了属性间具有补偿作用的和属性间具有非补偿作用的多维题组效应认知诊断模型。模拟研究结果显示新模型合理有效, 与线性Logistic模型和DINA模型对比研究后表明:(1)作答数据含有题组效应时, 忽略题组效应会导致项目参数的偏差估计并降低对目标属性的判准率; (2)新模型更具普适性, 即便当作答数据不存在题组效应时, 采用新模型进行测验分析亦能得到很好的项目参数估计结果且不影响对目标属性的判准率。整体来看, 新模型既具有认知诊断功能又可有效处理题组效应。  相似文献   

14.
A Bayesian random effects model for testlets   总被引:4,自引:0,他引:4  
Standard item response theory (IRT) models fit to dichotomous examination responses ignore the fact that sets of items (testlets) often come from a single common stimuli (e.g. a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences such as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). To model examinations that may be a mixture of independent items and testlets, we modified one standard IRT model to include an additional random effect for items nested within the same testlet. We use a Bayesian framework to facilitate posterior inference via a Data Augmented Gibbs Sampler (DAGS; Tanner & Wong, 1987). The modified and standard IRT models are both applied to a data set from a disclosed form of the SAT. We also provide simulation results that indicates that the degree of precision bias is a function of the variability of the testlet effects, as well as the testlet design.The authors wish to thank Robert Mislevy, Andrew Gelman and Donald B. Rubin for their helpful suggestions and comments, Ida Lawrence and Miriam Feigenbaum for providing us with the SAT data analyzed in section 5, and to the two anonymous referees for their careful reading and thoughtful suggestions on an earlier draft. We are also grateful to the Educational Testing service for providing the resources to do this research.  相似文献   

15.
吴锐  丁树良  甘登文 《心理学报》2010,42(3):434-442
题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。  相似文献   

16.
丁树良  罗芬  戴海琦  朱玮 《心理学报》2007,39(4):730-736
在IRT框架下,建立了0-1评分方式下单维双参数Logistic多题多做(MAMI)测验模型。与Spray给出的一题多做(MASI)模型相比,MAMI不仅模型更加精致,而且扩展了适用范围,参数估计方法也不同,采用EM算法求取项目参数。Monte Carlo模拟结果显示,应用MAMI测验模型与测验题量作相应增加的作法相比,两者给出的能力估计精度相同,但MAMI模型给出的项目参数估计精度更高。如果将MAMI测验模型与被试人数相应增加的作法相比,项目参数的估计精度相同,但MAMI给出的能力参数估计精度更高。这个发现表明,在一定条件下若允许修改答案,并采用累加式记分方式,纵使题量不变,也可使能力估计的精度相当于题量增加一倍的估计精度,而项目参数估计精度也会提高。这些发现不仅对技能评价和认知能力评价有参考价值,而且对数据的处理方式也有参考价值  相似文献   

17.
当观测指标变量为二分分类数据时,传统的因素分析方法不再适用。作者简要回顾了SEM框架下的分类数据因素分析模型和IRT框架下的测验题目和潜在能力的关系模型,并对两种框架下主要采用的参数估计方法进行了总结。通过两个模拟研究,比较了SEM框架下GLSc和MGLSc估计方法与IRT框架下MML/EM估计方法的差异。研究结果表明:(1)三种方法中,GLSc得到参数估计的偏差最大,MGLSc和MML/EM估计方法相差不大;(2)随着样本量增大,各种项目参数估计的精度均提高;(3)项目因素载荷和难度估计的精度受测验长度的影响;(4)项目因素载荷和区分度估计的精度受总体因素载荷(区分度)高低的影响;(5)测验项目中阈值的分布会影响参数估计的精度,其中受影响最大的是项目区分度。(6)总体来看,SEM框架下的项目参数估计精度较IRT框架下项目参数估计的精度高。此外,文章还将两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

18.
Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.  相似文献   

19.
The generalized graded unfolding model (GGUM) is an item response theory (IRT) model that implements symmetric, nonmonotonic, single-peaked item characteristic curves. The GGUM is appropriate for measuring individual differences for a variety of psychological constructs, especially attitudes. Like other IRT models, the location and scale (i.e., the metric) of parameter estimates from the GGUM are data dependent. Therefore, parameter estimates from alternative calibrations will generally not be comparable, even when responses to the same items are analyzed. GGUMLINK is a computer program developed to reexpress parameter estimates from two separate GGUM calibrations in a common metric. In this way, the results from separate calibrations of model parameters can be compared. GGUMLINK can secure a common metric by using one of five methods that have recently been generalized to the GGUM. The GGUMLINK executable program is available free and may be downloaded from http://www.education.umd.edu/EDMS/tutorials/index.html.  相似文献   

20.
A central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号