首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
For testlet response data, traditional item response theory (IRT) models are often not appropriate due to local dependence presented among items within a common testlet. Several testlet‐based IRT models have been developed to model examinees' responses. In this paper, a new two‐parameter normal ogive testlet response theory (2PNOTRT) model for dichotomous items is proposed by introducing testlet discrimination parameters. A Bayesian model parameter estimation approach via a data augmentation scheme is developed. Simulations are conducted to evaluate the performance of the proposed 2PNOTRT model. The results indicated that the estimation of item parameters is satisfactory overall from the viewpoint of convergence. Finally, the proposed 2PNOTRT model is applied to a set of real testlet data.  相似文献   

2.
This paper presents an explanatory multidimensional multilevel random item response model and its application to reading data with multilevel item structure. The model includes multilevel random item parameters that allow consideration of variability in item parameters at both item and item group levels. Item-level random item parameters were included to model unexplained variance remaining when item related covariates were used to explain variation in item difficulties. Item group-level random item parameters were included to model dependency in item responses among items having the same item stem. Using the model, this study examined the dimensionality of a person’s word knowledge, termed lexical representation, and how aspects of morphological knowledge contributed to lexical representations for different persons, items, and item groups.  相似文献   

3.
This paper extends Lumsden's fluctuation model to the graded response case and, from the resulting basic scaling model, develops a one‐dimensional item response theory graded response model (GRM). Under some additional assumptions, it follows that the item category response functions (ICRFs) can be closely approximated by the ICRFs of the standard GRM with equal item discrimination. For fixed item locations, the item responses depend on two individual differences parameters: the person central location and the person reliability. Procedures for estimating the person parameters and for addressing the goodness of fit of the proposed model as compared to the standard GRM are discussed. The accuracy of the person estimates is assessed by means of simulation studies. Finally, all the developments are illustrated using three empirical examples in personality measurement.  相似文献   

4.
This article describes a generalized longitudinal mixture item response theory (IRT) model that allows for detecting latent group differences in item response data obtained from electronic learning (e-learning) environments or other learning environments that result in large numbers of items. The described model can be viewed as a combination of a longitudinal Rasch model, a mixture Rasch model, and a random-item IRT model, and it includes some features of the explanatory IRT modeling framework. The model assumes the possible presence of latent classes in item response patterns, due to initial person-level differences before learning takes place, to latent class-specific learning trajectories, or to a combination of both. Moreover, it allows for differential item functioning over the classes. A Bayesian model estimation procedure is described, and the results of a simulation study are presented that indicate that the parameters are recovered well, particularly for conditions with large item sample sizes. The model is also illustrated with an empirical sample data set from a Web-based e-learning environment.  相似文献   

5.
阶层线性模型是处理阶层结构数据的高级统计方法, 项目反应理论是精确测量被试能力的现代测量理论。多水平项目反应理论将阶层线性模型和项目反应理论相结合, 将项目反应模型嵌套在阶层线性模型内, 实现了项目参数和不同水平能力参数的估计, 对回归系数和误差项变异的估计也更加精确。作者概述了多水平项目反应理论的发展历程, 并从项目功能差异、测验等值、学校效能研究等方面评述了多水平项目反应理论在心理与教育测量中的应用, 总结了多水平项目反应理论的价值, 同时展望了今后的研究趋势。  相似文献   

6.
Testing the fit of finite mixture models is a difficult task, since asymptotic results on the distribution of likelihood ratio statistics do not hold; for this reason, alternative statistics are needed. This paper applies the π* goodness of fit statistic to finite mixture item response models. The π* statistic assumes that the population is composed of two subpopulations – those that follow a parametric model and a residual group outside the model; π* is defined as the proportion of population in the residual group. The population was divided into two or more groups, or classes. Several groups followed an item response model and there was also a residual group. The paper presents maximum likelihood algorithms for estimating item parameters, the probabilities of the groups and π*. The paper also includes a simulation study on goodness of recovery for the two‐ and three‐parameter logistic models and an example with real data from a multiple choice test.  相似文献   

7.
A first-order autoregressive growth model is proposed for longitudinal binary item analysis where responses to the same items are conditionally dependent across time given the latent traits. Specifically, the item response probability for a given item at a given time depends on the latent trait as well as the response to the same item at the previous time, or the lagged response. An initial conditions problem arises because there is no lagged response at the initial time period. We handle this problem by adapting solutions proposed for dynamic models in panel data econometrics. Asymptotic and finite sample power for the autoregressive parameters are investigated. The consequences of ignoring local dependence and the initial conditions problem are also examined for data simulated from a first-order autoregressive growth model. The proposed methods are applied to longitudinal data on Korean students’ self-esteem.  相似文献   

8.
In optimal design research, designs are optimized with respect to some statistical criterion under a certain model for the data. The ideas from optimal design research have spread into various fields of research, and recently have been adopted in test theory and applied to item response theory (IRT) models. In this paper a generalized variance criterion is used for sequential sampling in the two-parameter IRT model. Some general principles are offered to enable a researcher to select the best sampling design for the efficient estimation of item parameters.  相似文献   

9.
Bayesian estimation of a multilevel IRT model using gibbs sampling   总被引:3,自引:0,他引:3  
In this article, a two-level regression model is imposed on the ability parameters in an item response theory (IRT) model. The advantage of using latent rather than observed scores as dependent variables of a multilevel model is that it offers the possibility of separating the influence of item difficulty and ability level and modeling response variation and measurement error. Another advantage is that, contrary to observed scores, latent scores are test-independent, which offers the possibility of using results from different tests in one analysis where the parameters of the IRT model and the multilevel model can be concurrently estimated. The two-parameter normal ogive model is used for the IRT measurement model. It will be shown that the parameters of the two-parameter normal ogive model and the multilevel model can be estimated in a Bayesian framework using Gibbs sampling. Examples using simulated and real data are given.  相似文献   

10.
现在,等值越来越受到各考试测验机构及测量学研究人员的重视,特别是项目反应理论等值的优越性更使他们有了信心。然而,很多人却没有注意到被试能力分布形态可能给等值结果带来的影响效果及程度。本研究以项目反应理论两级记分模型的项目参数等值在不同被试能力分布形态下的结果差异作为重点,探讨被试抽样偏差可能给项目特征曲线等值带来的误差问题。研究结果表明,被试能力分布形态会显著地影响项目参数等值的系数,特别地,能力分布的偏态系数与等值方程的截距存在显著的线性相关关系,但能力分布形态的变化对等值方程中斜率的影响并不明显  相似文献   

11.
Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.  相似文献   

12.
Abstract:  In test operations using IRT (item response theory), items are included in a test before being used to rate subjects and the response data is used to estimate their item parameters. However, this method of test operation may lead to item content leakage and an adequate test operation can become difficult. To address this problem, Ozaki and Toyoda (2005, 2006 ) developed item difficulty parameter estimation methods that use paired comparison data from the perspective of the difficulty of items as judged by raters familiar with the field. In the present paper, an improved method of item difficulty parameter estimation is developed. In this new method, an item for which the difficulty parameter is to be estimated is compared with multiple items simultaneously, from the perspective of their difficulty. This is not a one-to-one comparison but a one-to-many comparison. In the comparisons, raters are informed that items selected from an item pool are ordered according to difficulty. The order will provide insight to improve the accuracy of judgment.  相似文献   

13.
Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores.  相似文献   

14.
Marginal maximum‐likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first‐level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.  相似文献   

15.
With a few exceptions, the problem of linking item response model parameters from different item calibrations has been conceptualized as an instance of the problem of test equating scores on different test forms. This paper argues, however, that the use of item response models does not require any test score equating. Instead, it involves the necessity of parameter linking due to a fundamental problem inherent in the formal nature of these models—their general lack of identifiability. More specifically, item response model parameters need to be linked to adjust for the different effects of the identifiability restrictions used in separate item calibrations. Our main theorems characterize the formal nature of these linking functions for monotone, continuous response models, derive their specific shapes for different parameterizations of the 3PL model, and show how to identify them from the parameter values of the common items or persons in different linking designs.  相似文献   

16.
刘红云  骆方  王玥  张玉 《心理学报》2012,44(1):121-132
作者简要回顾了SEM框架下分类数据因素分析(CCFA)模型和MIRT框架下测验题目和潜在能力的关系模型, 对两种框架下的主要参数估计方法进行了总结。通过模拟研究, 比较了SEM框架下WLSc和WLSMV估计方法与MIRT框架下MLR和MCMC估计方法的差异。研究结果表明:(1) WLSc得到参数估计的偏差最大, 且存在参数收敛的问题; (2)随着样本量增大, 各种项目参数估计的精度均提高, WLSMV方法与MLR方法得到的参数估计精度差异很小, 大多数情况下不比MCMC方法差; (3)除WLSc方法外, 随着每个维度测验题目的增多参数估计的精度逐渐增高; (4)测验维度对区分度参数和难度参数的影响较大, 而测验维度对项目因素载荷和阈值的影响相对较小; (5)项目参数的估计精度受项目测量维度数的影响, 只测量一个维度的项目参数估计精度较高。另外文章还对两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

17.
This paper proposes two unidimensional item response theory (IRT) models for analysing normative forced‐choice personality items. Both models are derived from a common theoretical framework and arise as a result of different assumptions regarding the mechanism of choice. The simplest mechanism gives rise to the one‐parameter normal‐ogive model. The second mechanism gives rise to a new IRT model, which is closely related to the Coombs–Zinnes probabilistic unfolding model. The second model is compared theoretically to the normal‐ogive model in terms of item characteristic curves and amount of item information. Next, procedures for estimating the respondent and the item parameters in the second model are described. Finally, both models are empirically compared by using two well‐known personality measures.  相似文献   

18.
The paper addresses three neglected questions from IRT. In section 1, the properties of the “measurement” of ability or trait parameters and item difficulty parameters in the Rasch model are discussed. It is shown that the solution to this problem is rather complex and depends both on general assumptions about properties of the item response functions and on assumptions about the available item universe. Section 2 deals with the measurement of individual change or “modifiability” based on a Rasch test. A conditional likelihood approach is presented that yields (a) an ML estimator of modifiability for given item parameters, (b) allows one to test hypotheses about change by means of a Clopper-Pearson confidence interval for the modifiability parameter, or (c) to estimate modifiability jointly with the item parameters. Uniqueness results for all three methods are also presented. In section 3, the Mantel-Haenszel method for detecting DIF is discussed under a novel perspective: What is the most general framework within which the Mantel-Haenszel method correctly detects DIF of a studied item? The answer is that this is a 2PL model where, however, all discrimination parameters are known and the studied item has the same discrimination in both populations. Since these requirements would hardly be satisfied in practical applications, the case of constant discrimination parameters, that is, the Rasch model, is the only realistic framework. A simple Pearsonx 2 test for DIF of one studied item is proposed as an alternative to the Mantel-Haenszel test; moreover, this test is generalized to the case of two items simultaneously studied for DIF.  相似文献   

19.
A person fit test based on the Lagrange multiplier test is presented for three item response theory models for polytomous items: the generalized partial credit model, the sequential model, and the graded response model. The test can also be used in the framework of multidimensional ability parameters. It is shown that the Lagrange multiplier statistic can take both the effects of estimation of the item parameters and the estimation of the person parameters into account. The Lagrange multiplier statistic has an asymptotic χ2-distribution. The Type I error rate and power are investigated using simulation studies. Results show that test statistics that ignore the effects of estimation of the persons’ ability parameters have decreased Type I error rates and power. Incorporating a correction to account for the effects of the estimation of the persons’ ability parameters results in acceptable Type I error rates and power characteristics; incorporating a correction for the estimation of the item parameters has very little additional effect. It is investigated to what extent the three models give comparable results, both in the simulation studies and in an example using data from the NEO Personality Inventory-Revised.  相似文献   

20.
项目反应理论(IRT)是用于客观测量的现代教育与心理测量理论之一,广泛用于缺失数据十分常见的大尺度测验分析。IRT中两参数逻辑斯蒂克模型(2PLM)下仅有完全随机缺失机制下缺失反应和缺失能力处理的EM算法。本研究推导2PLM下缺失反应忽略的EM 算法,并提出随机缺失机制下缺失反应和缺失能力处理的EM算法和考虑能力估计和作答反应不确定性的多重借补法。研究显示:在各种缺失机制、缺失比例和测验设计下,缺失反应忽略的EM算法和多重借补法表现理想。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号