首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Marginal maximum‐likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first‐level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.  相似文献   

2.
In this paper, it is shown that various violations of the 2-PL model and the nominal response model can be evaluated using the Lagrange multiplier test or the equivalent efficient score test. The tests presented here focus on violation of local stochastic independence and insufficient capture of the form of the item characteristic curves. Primarily, the tests are item-oriented diagnostic tools, but taken together, they also serve the purpose of evaluation of global model fit. A useful feature of Lagrange multiplier statistics is that they are evaluated using maximum likelihood estimates of the null-model only, that is, the parameters of alternative models need not be estimated. As numerical examples, an application to real data and some power studies are presented.  相似文献   

3.
The omega (ω) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected copied responses through probability sampling and bootstrapping. In doing so, the bias in copier ability estimation will be determined and used to update the ability estimate for calculating the modified omega (ωm), a new statistic based on the ω. The performance of ωm and ω were compared in a Monte Carlo simulation study under 40 typical testing conditions (2 test lengths x 4 sample sizes x 5 levels of copying). In almost all conditions, the ωm had the same or better controlled Type I error and higher power than ω. The increase in power was particularly eminent when the source's estimated ability was higher than the copier and when 20% or 30% of items were copied. These findings support the use of the ωm as a replacement of ω to detect answer copying in multiple choice exams.  相似文献   

4.
基于改进的Wald统计量,将适用于两群组的DIF检测方法拓展至多群组的项目功能差异(DIF)检验;改进的Wald统计量将分别通过计算观察信息矩阵(Obs)和经验交叉相乘信息矩阵(XPD)而得到。模拟研究探讨了此二者与传统计算方法在多个群组下的DIF检验情况,结果表明:(1)Obs和XPD的一类错误率明显低于传统方法,DINA模型估计下Obs和XPD的一类错误率接近理论水平;(2)样本量和DIF量较大时,Obs和XPD具有与传统Wald统计量大体相同的统计检验力。  相似文献   

5.
詹沛达 《心理科学》2019,(1):170-178
随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高模型参数的估计精度,还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。  相似文献   

6.
Hou,de la Torre和Nandakumar(2014)提出可以使用Wald统计量检验DIF,但其结果的一类错误率存在过度膨胀的问题。本研究中提出了一个使用观察信息矩阵进行计算的改进后的Wald统计量。结果表明:(1)使用观察信息矩阵计算的这一改进后的Wald统计量在DIF检验中具有良好的一类错误控制率,尤其是在项目具有较高区分能力的时候,解决了以往研究中一类错误率过度膨胀的问题。(2)随着样本量的增加以及DIF量的增大,使用观察信息矩阵计算Wald统计量的统计检验力也在增加。  相似文献   

7.
The Rasch model predicts that an individual's ability level is invariant over subtests of the total test, and thus, all subtests measure the same latent trait. A person test of this invariance hypothesis is discussed that is uniformly most powerful and standardized in the sense that the conditional distribution of the test statistic, given a particular level of ability, does not depend on the absolute value of the examinee's ability parameter. The test can be routinely performed by applying a computer program designed by and obtainable from the author. Finally, a suboptimal test is derived that is extremely easy to use, and an overall group test of the invariance hypothesis discussed. All tests considered do not rely on asymptotic approximations; hence, they may be applied when the test is of only moderate length and the group of examinees is small.  相似文献   

8.
Person fit statistics are considered for dichotomous item response models. The asymptotic null distribution is derived for statistics which are linear in the item responses, and in which the ability parameter is replaced by an estimate. This allows the asymptotically correct standardization of linear person fit statistics with estimated ability parameter. The fact that the ability parameter is estimated usually decreases the asymptotic variance.I am indebted to Herbert Hoijtink and three anonymous referees for their comments on a previous version.  相似文献   

9.
刘红云  李冲  张平平  骆方 《心理学报》2012,44(8):1124-1136
测量工具满足等价性是进行多组比较的前提, 测量等价性的检验方法主要有基于CFA的多组比较法和基于IRT的DIF检验两类方法。文章比较了单维测验情境下基于CCFA的DIFFTEST检验方法和基于IRT模型的IRT-LR检验方法, 以及多维测验情境下DIFFTEST和基于MIRT的卡方检验方法的差异。通过模拟研究的方法, 比较了几种方法的检验力和第一类错误, 并考虑了样本总量、样本量的组间均衡性、测验长度、阈值差异大小以及维度间相关程度的影响。研究结果表明:(1)在单维测验下, IRT-LR是比DIFFTEST更为严格的检验方法; 多维测验下, 在测验较长、测验维度之间相关较高时, MIRT-MG比DIFFTEST更容易检验出项目阈值的差异, 而在测验长度较短、维度之间相关较小时, DIFFTEST的检验力反而略高于MIRT-MG方法。(2)随着阈值差值增加, DIFFTEST、IRT-LR和MIRT-MG三种方法的检验力均在增加, 当阈值差异达到中等或较大时, 三种方法都可以有效检验出测验阈值的不等价性。(3)随着样本总量增加, DIFFTEST、IRT-LR和MIRT-MG方法的检验力均在增加; 在总样本量不变, 两组样本均衡情况下三种方法的检验力均高于不均衡的情况。(4)违背等价性题目个数不变时, 测验越长DIFFTEST的检验力会下降, 而IRT-LR和MIRT-MG检验力则上升。(5) DIFFTEST方法的一类错误率平均值接近名义值0.05; 而IRT-LR和MIRT-MG方法的一类错误率平均值远低于0.05。  相似文献   

10.
A pplications of standard item response theory models assume local independence of items and persons. This paper presents polytomous multilevel testlet models for dual dependence due to item and person clustering in testlet‐based assessments with clustered samples. Simulation and survey data were analysed with a multilevel partial credit testlet model. This model was compared with three alternative models – a testlet partial credit model (PCM), multilevel PCM, and PCM – in terms of model parameter estimation. The results indicated that the deviance information criterion was the fit index that always correctly identified the true multilevel testlet model based on the quantified evidence in model selection, while the Akaike and Bayesian information criteria could not identify the true model. In general, the estimation model and the magnitude of item and person clustering impacted the estimation accuracy of ability parameters, while only the estimation model and the magnitude of item clustering affected the item parameter estimation accuracy. Furthermore, ignoring item clustering effects produced higher total errors in item parameter estimates but did not have much impact on the accuracy of ability parameter estimates, while ignoring person clustering effects yielded higher total errors in ability parameter estimates but did not have much effect on the accuracy of item parameter estimates. When both clustering effects were ignored in the PCM, item and ability parameter estimation accuracy was reduced.  相似文献   

11.
When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000).  相似文献   

12.
多维项目反应理论因其模型本身的天然优势及其兼具因素分析与项目反应理论于一身的优点,而被广大研究者及应用者所重视.本研究在前人研究基础上,重点讨论MIRT多维能力及能力间相关矩阵的参数估计问题.研究采用Monte Carlo模拟方法进行,在三因素完全随机设计(4 ×3×3)下,使用MCMC算法,探讨测验维度数、维度间的相关大小和测验项目数三个因素对MIRT能力及其相关矩阵估计的影响.  相似文献   

13.
测验垂直等值是指将测试同一心理特质的不同水平的测验转换到同一个分数量尺上的过程。IRT与MIRT是实现垂直等值的主要方法。IRT无需假设被试的能力分布, 参数估计不依赖于样本, 是构建垂直量表的有效方法, 但测验不满足单维假设时其应用受到限制。MIRT结合IRT和因素分析的特点对IRT进行了拓展, 可更有效估计多维测验的项目参数和被试能力参数, 在垂直等值中有重要应用。已有研究主要探讨IRT和MIRT在垂直等值应用中的适用性、标定方法和参数估计方法, 比较研究两种方法的特性。未来研究应纳入更多变量条件进行比较研究, 拓展方法的应用。  相似文献   

14.
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.  相似文献   

15.
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed.  相似文献   

16.
The simultaneous and nonparametric estimation of latent abilities and item characteristic curves is considered. The asymptotic properties of ordinal ability estimation and kernel smoothed nonparametric item characteristic curve estimation are investigated under very general assumptions on the underlying item response theory model as both the test length and the sample size increase. A large deviation probability inequality is stated for ordinal ability estimation. The mean squared error of kernel smoothed item characteristic curve estimates is studied and a strong consistency result is obtained showing that the worst case error in the item characteristic curve estimates over all items and ability levels converges to zero with probability equal to one.  相似文献   

17.
This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.  相似文献   

18.
Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores.  相似文献   

19.
多维题组效应Rasch模型   总被引:2,自引:0,他引:2  
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。  相似文献   

20.
Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item and ability parameters. Simulated data sets were analyzed via two joint and two marginal Bayesian estimation procedures. The marginal Bayesian estimation procedures yielded consistently smaller root mean square differences than the joint Bayesian estimation procedures for item and ability estimates. As the sample size and test length increased, the four Bayes procedures yielded essentially the same result.The authors wish to thank the Editor and anonymous reviewers for their insightful comments and suggestions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号