When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000).  相似文献   

A joint Bayesian estimation procedure for the estimation of parameters in the three-parameter logistic model is developed in this paper. Procedures for specifying prior beliefs for the parameters are given. It is shown through simulation studies that the Bayesian procedure (i) ensures that the estimates stay in the parameter space, and (ii) produces better estimates than the joint maximum likelihood procedure as judged by such criteria as mean squared differences between estimates and true values. The research reported here was performed pursuant to Grant No. N0014-79-C-0039 with the Office of Naval Research. A related article by Robert J. Mislevy (1986) appeared when the present paper was in the printing stage.  相似文献   

Information functions are used to find the optimum ability levels and maximum contributions to information for estimating item parameters in three commonly used logistic item response models. For the three and two parameter logistic models, examinees who contribute maximally to the estimation of item difficulty contribute little to the estimation of item discrimination. This suggests that in applications that depend heavily upon the veracity of individual item parameter estimates (e.g. adaptive testing or text construction), better item calibration results may be obtained (for fixed sample sizes) from examinee calibration samples in which ability is widely dispersed.This work was supported by Contract No. N00014-83-C-0457, project designation NR 150-520, from Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval Research and Educational Testing Service through the Program Research Planning Council. Reproduction in whole or in part is permitted for any purpose of the United States Government. The author wishes to acknowledge the invaluable assistance of Maxine B. Kingston in carrying out this study, and to thank Charles Lewis for his many insightful comments on earlier drafts of this paper.  相似文献   

Among current state-of-the-art estimation methods for multilevel IRT models, the two-stage divide-and-conquer strategy has practical advantages, such as clearer definition of factors, convenience for secondary data analysis, convenience for model calibration and fit evaluation, and avoidance of improper solutions. However, various studies have shown that, under the two-stage framework, ignoring measurement error in the dependent variable in stage II leads to incorrect statistical inferences. To this end, we proposed a novel method to correct both measurement bias and measurement error of latent trait estimates from stage I in the stage II estimation. In this paper, the HO-IRT model is considered as the measurement model, and a linear mixed effects model on overall (i.e., higher-order) abilities is considered as the structural model. The performance of the proposed correction method is illustrated and compared via a simulation study and a real data example using the National Educational Longitudinal Survey data (NELS 88). Results indicate that structural parameters can be recovered better after correcting measurement biases and errors.  相似文献   

The test information function serves important roles in latent trait models and in their applications. Among others, it has been used as the measure of accuracy in ability estimation. A question arises, however, if the test information function is accurate enough for all meaningful levels of ability relative to the test, especially when the number of test items is relatively small (e.g., less than 50). In the present paper, using the constant information model and constant amounts of test information for a finite interval of ability, simulated data were produced for eight different levels of ability and for twenty different numbers of test items ranging between 10 and 200. Analyses of these data suggest that it is desirable to consider some modification of the test information function when it is used as the measure of accuracy in ability estimation.  相似文献   

本研究以义务教育阶段学生识字量测验为工具,综合运用探索性结构方程建模(ESEM)以及非参数项目反应理论中的摩根量表(Mokken量表)和DETECT分析方法,探讨了识字能力的维度。探索性结构方程建模结果显示,识字的单维性模型优于多维模型,多维的结果更多的体现出一个难度维度的特征,即字频的作用。Mokken量表分析结果显示,1~2年级和3~9年级测验更倾向于单维量表的特征。DETECT分析结果显示,两个测验的D值趋近于零,表明识字能力是单维能力。结合三种分析方法,识字能力具有单维性。  相似文献   

四参数Logistic模型潜在特质参数的Warm加权极大似然估计   总被引:1,自引:0,他引:1  
孟祥斌  陶剑  陈莎莉 《心理学报》2016,(8):1047-1056
本文以四参数Logistic(4-parameter Logistic,4PL)模型为研究对象,根据Warm的加权极大似然估计技巧,提出了4PL模型潜在特质参数的加权极大似然估计方法,并借助模拟研究对加权极大似然估计的性质进行验证。研究结果表明,与通常的极大似然估计和后验期望估计相比,加权极大似然估计的偏差(bias)明显减小,并且具有良好的返真性能。此外,在测试的长度较短和项目的区分度较小的情况下,加权极大似然估计依然保持了良好的统计性质,表现出更加显著的优势。  相似文献   

In item response theory, the classical estimators of ability are highly sensitive to response disturbances and can return strongly biased estimates of the true underlying ability level. Robust methods were introduced to lessen the impact of such aberrant responses on the estimation process. The computation of asymptotic (i.e., large‐sample) standard errors (ASE) for these robust estimators, however, has not yet been fully considered. This paper focuses on a broad class of robust ability estimators, defined by an appropriate selection of the weight function and the residual measure, for which the ASE is derived from the theory of estimating equations. The maximum likelihood (ML) and the robust estimators, together with their estimated ASEs, are then compared in a simulation study by generating random guessing disturbances. It is concluded that both the estimators and their ASE perform similarly in the absence of random guessing, while the robust estimator and its estimated ASE are less biased and outperform their ML counterparts in the presence of random guessing with large impact on the item response process.  相似文献   

Multidimensional item response theory (MIRT) is widely used in assessment and evaluation of educational and psychological tests. It models the individual response patterns by specifying a functional relationship between individuals' multiple latent traits and their responses to test items. One major challenge in parameter estimation in MIRT is that the likelihood involves intractable multidimensional integrals due to the latent variable structure. Various methods have been proposed that involve either direct numerical approximations to the integrals or Monte Carlo simulations. However, these methods are known to be computationally demanding in high dimensions and rely on sampling data points from a posterior distribution. We propose a new Gaussian variational expectation--maximization (GVEM) algorithm which adopts variational inference to approximate the intractable marginal likelihood by a computationally feasible lower bound. In addition, the proposed algorithm can be applied to assess the dimensionality of the latent traits in an exploratory analysis. Simulation studies are conducted to demonstrate the computational efficiency and estimation precision of the new GVEM algorithm compared to the popular alternative Metropolis–Hastings Robbins–Monro algorithm. In addition, theoretical results are presented to establish the consistency of the estimator from the new GVEM algorithm.  相似文献   

A commonly used method to evaluate the accuracy of a measurement is to provide a confidence interval that contains the parameter of interest with a given high probability. Smallest exact confidence intervals for the ability parameter of the Rasch model are derived and compared to the traditional, asymptotically valid intervals based on the Fisher information. Tables of the exact confidence intervals, termed Clopper-Pearson intervals, can be routinely drawn up by applying a computer program designed by and obtainable from the author. These tables are particularly useful for tests of only moderate lengths where the asymptotic method does not provide valid confidence intervals.  相似文献   

Samejima identified the possibility of multiple solutions to the likelihood equation (multiple maxima in the likelihood function) for estimating an examinee's trait value for the three-parameter logistic model. In the practical applications that Lord studied, he found that multiple solutions did not occur when the number of items was 20. In the present paper, fourteen multiple-choice achievement tests with from 20 to 50 items were examined to see if it was possible for them to produce item response vectors with multiple maxima; such vectors were found for all the tests. Examination of response vectors for large groups of real examinees found that from 0 to 3.1% of them had response vectors with multiple maxima. The implications of these results for multiple-choice tests are discussed.  相似文献   

This paper proposes a multi-objective programming method for determining samples of examinees needed for estimating the parameters of a group of items. In the numerical experiments, optimum samples are compared to uniformly and normally distributed samples. The results show that the samples usually recommended in the literature are well suited for estimating the difficulty parameters. Furthermore, they are also adequate for estimating the discrimination parameters in the three-parameter model, butnot for the guessing parameters.  相似文献   

A method is proposed for constructing indices as linear functions of variables such that the reliability of the compound score is maximized. Reliability is defined in the framework of latent variable modeling [i.e., item response theory (IRT)] and optimal weights of the components of the index are found by maximizing the posterior variance relative to the total latent variable variance. Three methods for estimating the weights are proposed. The first is a likelihood-based approach, that is, marginal maximum likelihood (MML). The other two are Bayesian approaches based on Markov chain Monte Carlo (MCMC) computational methods. One is based on an augmented Gibbs sampler specifically targeted at IRT, and the other is based on a general purpose Gibbs sampler such as implemented in OpenBugs and Jags. Simulation studies are presented to demonstrate the procedure and to compare the three methods. Results are very similar, so practitioners may be suggested the use of the easily accessible latter method. A real-data set pertaining to the 28-joint Disease Activity Score is used to show how the methods can be applied in a complex measurement situation with multiple time points and mixed data formats.  相似文献   

Applications of item response theory, which depend upon its parameter invariance property, require that parameter estimates be unbiased. A new method, weighted likelihood estimation (WLE), is derived, and proved to be less biased than maximum likelihood estimation (MLE) with the same asymptotic variance and normal distribution. WLE removes the first order bias term from MLE. Two Monte Carlo studies compare WLE with MLE and Bayesian modal estimation (BME) of ability in conventional tests and tailored tests, assuming the item parameters are known constants. The Monte Carlo studies favor WLE over MLE and BME on several criteria over a wide range of the ability scale.  相似文献   

Using Lumsden’s Thurstonian fluctuation model as a starting point, this paper attempts to develop a unidimensional item response theory model intended for binary personality items. Under some additional assumptions, a new model is obtained in which the item characteristic curves are defined by a cumulative Pearson-Type-VII distribution, and the person response curves are two-parameter normal ogives. Procedures for fitting the new model are proposed. Furthermore, the relations between individual fluctuation and scalability are discussed, and a scalability index based on the new model is proposed. All the developments in this paper are illustrated using two empirical examples.  相似文献   

多维项目反应理论因其模型本身的天然优势及其兼具因素分析与项目反应理论于一身的优点,而被广大研究者及应用者所重视.本研究在前人研究基础上,重点讨论MIRT多维能力及能力间相关矩阵的参数估计问题.研究采用Monte Carlo模拟方法进行,在三因素完全随机设计(4 ×3×3)下,使用MCMC算法,探讨测验维度数、维度间的相关大小和测验项目数三个因素对MIRT能力及其相关矩阵估计的影响.  相似文献   

An estimate and an upper-bound estimate for the reliability of a test composed of binary items is derived from the multidimensional latent trait theory proposed by Bock and Aitkin (1981). The estimate derived here is similar to internal consistency estimates (such as coefficient alpha) in that it is a function of the correlations among test items; however, it is not a lowerbound estimate as are all other similar methods.An upper bound to reliability that is less than unity does not exist in the context of classical test theory. The richer theoretical background provided by Bock and Aitkin's latent trait model has allowed the development of an index (called here) that is always greater-than or equal-to the reliability coefficient for a test (and is less-than or equal-to one). The upper bound estimate of reliability has practical uses—one of which makes use of the greatest lower bound.  相似文献   

Two methods of estimating parameters in the Rasch model are compared. It is shown that estimates for a certain loglinear model for the score × item × response table are equivalent to the unconditional maximum likelihood estimates for the Rasch model.  相似文献   

