首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The supplemented EM (SEM) algorithm is applied to address two goodness‐of‐fit testing problems in psychometrics. The first problem involves computing the information matrix for item parameters in item response theory models. This matrix is important for limited‐information goodness‐of‐fit testing and it is also used to compute standard errors for the item parameter estimates. For the second problem, it is shown that the SEM algorithm provides a convenient computational procedure that leads to an asymptotically chi‐squared goodness‐of‐fit statistic for the ‘two‐stage EM’ procedure of fitting covariance structure models in the presence of missing data. Both simulated and real data are used to illustrate the proposed procedures.  相似文献   

2.
3.
Bartholomew and Leung proposed a limited‐information goodness‐of‐fit test statistic (Y) for models fitted to sparse 2P contingency tables. The null distribution of Y was approximated using a chi‐squared distribution by matching moments. The moments were derived under the assumption that the model parameters were known in advance and it was conjectured that the approximation would also be appropriate when the parameters were to be estimated. Using maximum likelihood estimation of the two‐parameter logistic item response theory model, we show that the effect of parameter estimation on the distribution of Y is too large to be ignored. Consequently, we derive the asymptotic moments of Y for maximum likelihood estimation. We show using a simulation study that when the null distribution of Y is approximated using moments that take into account the effect of estimation, Y becomes a very useful statistic to assess the overall goodness of fit of models fitted to sparse 2P tables.  相似文献   

4.
A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier statistic can take both the effects of estimation of the item parameters and the estimation of the person parameters into account. The Lagrange multiplier statistic has an asymptotic χ2-distribution. The Type I error rate and power are investigated using simulation studies. Results show that test statistics that ignore the effects of estimation of the persons’ ability parameters have decreased Type I error rates and power. Incorporating a correction to account for the effects of the estimation of the persons’ ability parameters results in acceptable Type I error rates and power characteristics; incorporating a correction for the estimation of the item parameters has very little additional effect. It is investigated to what extent the three models give comparable results, both in the simulation studies and in an example using data from the NEO Personality Inventory-Revised.  相似文献   

5.
A pplications of standard item response theory models assume local independence of items and persons. This paper presents polytomous multilevel testlet models for dual dependence due to item and person clustering in testlet‐based assessments with clustered samples. Simulation and survey data were analysed with a multilevel partial credit testlet model. This model was compared with three alternative models – a testlet partial credit model (PCM), multilevel PCM, and PCM – in terms of model parameter estimation. The results indicated that the deviance information criterion was the fit index that always correctly identified the true multilevel testlet model based on the quantified evidence in model selection, while the Akaike and Bayesian information criteria could not identify the true model. In general, the estimation model and the magnitude of item and person clustering impacted the estimation accuracy of ability parameters, while only the estimation model and the magnitude of item clustering affected the item parameter estimation accuracy. Furthermore, ignoring item clustering effects produced higher total errors in item parameter estimates but did not have much impact on the accuracy of ability parameter estimates, while ignoring person clustering effects yielded higher total errors in ability parameter estimates but did not have much effect on the accuracy of item parameter estimates. When both clustering effects were ignored in the PCM, item and ability parameter estimation accuracy was reduced.  相似文献   

6.
迫选(forced-choice, FC)测验由于可以控制传统李克特方法带来的反应偏差, 被广泛应用于非认知测验中, 而迫选测验的传统计分方式会产生自模式数据, 这种数据由于不适合于个体间的比较, 一直备受批评。近年来, 多种迫选IRT模型的发展使研究者能够从迫选测验中获得接近常模性的数据, 再次引起了研究者与实践人员对迫选IRT模型的兴趣。首先, 依据所采纳的决策模型和题目反应模型对6种较为主流的迫选IRT模型进行分类和介绍。然后, 从模型构建思路、参数估计方法两个角度对各模型进行比较与总结。其次, 从参数不变性检验、计算机化自适应测验(computerized adaptive testing, CAT)和效度研究3个应用研究方面进行述评。最后提出未来研究可以在模型拓展、参数不变性检验、迫选CAT测验和效度研究4个方向深入。  相似文献   

7.
Testing the fit of finite mixture models is a difficult task, since asymptotic results on the distribution of likelihood ratio statistics do not hold; for this reason, alternative statistics are needed. This paper applies the π* goodness of fit statistic to finite mixture item response models. The π* statistic assumes that the population is composed of two subpopulations – those that follow a parametric model and a residual group outside the model; π* is defined as the proportion of population in the residual group. The population was divided into two or more groups, or classes. Several groups followed an item response model and there was also a residual group. The paper presents maximum likelihood algorithms for estimating item parameters, the probabilities of the groups and π*. The paper also includes a simulation study on goodness of recovery for the two‐ and three‐parameter logistic models and an example with real data from a multiple choice test.  相似文献   

8.
Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item and ability parameters. Simulated data sets were analyzed via two joint and two marginal Bayesian estimation procedures. The marginal Bayesian estimation procedures yielded consistently smaller root mean square differences than the joint Bayesian estimation procedures for item and ability estimates. As the sample size and test length increased, the four Bayes procedures yielded essentially the same result.The authors wish to thank the Editor and anonymous reviewers for their insightful comments and suggestions.  相似文献   

9.
The semi‐parametric proportional hazards model with crossed random effects has two important characteristics: it avoids explicit specification of the response time distribution by using semi‐parametric models, and it captures heterogeneity that is due to subjects and items. The proposed model has a proportionality parameter for the speed of each test taker, for the time intensity of each item, and for subject or item characteristics of interest. It is shown how all these parameters can be estimated by Markov chain Monte Carlo methods (Gibbs sampling). The performance of the estimation procedure is assessed with simulations and the model is further illustrated with the analysis of response times from a visual recognition task.  相似文献   

10.
Assessing item fit for unidimensional item response theory models for dichotomous items has always been an issue of enormous interest, but there exists no unanimously agreed item fit diagnostic for these models, and hence there is room for further investigation of the area. This paper employs the posterior predictive model‐checking method, a popular Bayesian model‐checking tool, to examine item fit for the above‐mentioned models. An item fit plot, comparing the observed and predicted proportion‐correct scores of examinees with different raw scores, is suggested. This paper also suggests how to obtain posterior predictive p‐values (which are natural Bayesian p‐values) for the item fit statistics of Orlando and Thissen that summarize numerically the information in the above‐mentioned item fit plots. A number of simulation studies and a real data application demonstrate the effectiveness of the suggested item fit diagnostics. The suggested techniques seem to have adequate power and reasonable Type I error rate, and psychometricians will find them promising.  相似文献   

11.
Two types of global testing procedures for item fit to the Rasch model were evaluated using simulation studies. The first type incorporates three tests based on first‐order statistics: van den Wollenberg's Q1 test, Glas's R1 test, and Andersen's LR test. The second type incorporates three tests based on second‐order statistics: van den Wollenberg's Q2 test, Glas's R2 test, and a non‐parametric test proposed by Ponocny. The Type I error rates and the power against the violation of parallel item response curves, unidimensionality and local independence were analysed in relation to sample size and test length. In general, the outcomes indicate a satisfactory performance of all tests, except the Q2 test which exhibits an inflated Type I error rate. Further, it was found that both types of tests have power against all three types of model violation. A possible explanation is the interdependencies among the assumptions underlying the model.  相似文献   

12.
This paper proposes two unidimensional item response theory (IRT) models for analysing normative forced‐choice personality items. Both models are derived from a common theoretical framework and arise as a result of different assumptions regarding the mechanism of choice. The simplest mechanism gives rise to the one‐parameter normal‐ogive model. The second mechanism gives rise to a new IRT model, which is closely related to the Coombs–Zinnes probabilistic unfolding model. The second model is compared theoretically to the normal‐ogive model in terms of item characteristic curves and amount of item information. Next, procedures for estimating the respondent and the item parameters in the second model are described. Finally, both models are empirically compared by using two well‐known personality measures.  相似文献   

13.
Two different item response theory model frameworks have been proposed for the assessment and control of response styles in rating data. According to one framework, response styles can be assessed by analysing threshold parameters in Rasch models for ordinal data and in mixture‐distribution extensions of such models. A different framework is provided by multi‐process item response tree models, which can be used to disentangle response processes that are related to the substantive traits and response tendencies elicited by the response scale. In this tutorial, the two approaches are reviewed, illustrated with an empirical data set of the two‐dimensional ‘Personal Need for Structure’ construct, and compared in terms of multiple criteria. Mplus is used as a software framework for (mixed) polytomous Rasch models and item response tree models as well as for demonstrating how parsimonious model variants can be specified to test assumptions on the structure of response styles and attitude strength. Although both frameworks are shown to account for response styles, they differ on the quantitative criteria of model selection, practical aspects of model estimation, and conceptual issues of representing response styles as continuous and multidimensional sources of individual differences in psychological assessment.  相似文献   

14.
For testlet response data, traditional item response theory (IRT) models are often not appropriate due to local dependence presented among items within a common testlet. Several testlet‐based IRT models have been developed to model examinees' responses. In this paper, a new two‐parameter normal ogive testlet response theory (2PNOTRT) model for dichotomous items is proposed by introducing testlet discrimination parameters. A Bayesian model parameter estimation approach via a data augmentation scheme is developed. Simulations are conducted to evaluate the performance of the proposed 2PNOTRT model. The results indicated that the estimation of item parameters is satisfactory overall from the viewpoint of convergence. Finally, the proposed 2PNOTRT model is applied to a set of real testlet data.  相似文献   

15.
The paper suggests an extension of current probability models of mental testing to include practice effects resulting when the same subject responds t o the same ability test item on several occasions. The method involves the treatment of multiple presentations of the same item as a stochastic process, Le., a series of interrelated stochastic events. Some general results are presented and a particular model for practice effects utilizing the latent classes and linear operator learning models1 is discussed a t some length. Methods of parameter estimation and testing goodness of fit are presented and illustrated with a numerical example.  相似文献   

16.
The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi‐parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non‐parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non‐parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees’ latent speeds; whereas the non‐parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box–Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two‐stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non‐parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested.  相似文献   

17.
多维题组效应Rasch模型   总被引:2,自引:0,他引:2  
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。  相似文献   

18.
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed.  相似文献   

19.
A multinormal partial credit model for factor analysis of polytomously scored items with ordered response categories is derived using an extension of the Dutch Identity (Holland in Psychometrika 55:5?C18, 1990). In the model, latent variables are assumed to have a multivariate normal distribution conditional on unweighted sums of item scores, which are sufficient statistics. Attention is paid to maximum likelihood estimation of item parameters, multivariate moments of latent variables, and person parameters. It is shown that the maximum likelihood estimates can be found without the use of numerical integration techniques. More general models are discussed which can be used for testing the model, and it is shown how models with different numbers of latent variables can be tested against each other. In addition, multi-group extensions are proposed, which can be used for testing both measurement invariance and latent population differences. Models and procedures discussed are demonstrated in an empirical data example.  相似文献   

20.
高旭亮  汪大勋  王芳  蔡艳  涂冬波 《心理学报》2019,51(12):1386-1397
基于分部评分模型的思路, 本文提出了一般化的分部评分认知诊断模型(General Partial Credit Diagnostic Model, GPCDM), 与国际上已有的基于分部评分模型思路的多级评分模型GDM (von Davier, 2008)和PC-DINA (de la Torre, 2012)相比, GPCDM的Q矩阵定义更加灵活, 项目参数的约束条件更少。Monte Carlo实验研究表明, GPCDM模型的参数估计精度指标RMSE介于[0.015, 0.043], 表明估计精度尚可; TIMSS (2007)实证数据应用研究表明, 与GDM和PC-DINA模型相比, GPCDM与该数据的拟合度更好, 并且使用GPCDM分析该数据的诊断效果也更优。总之, 本研究提供了一种约束条件更少、功能更为强大的多级评分认知诊断模型。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号