首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Analysing ordinal data is becoming increasingly important in psychology, especially in the context of item response theory. The generalized partial credit model (GPCM) is probably the most widely used ordinal model and has found application in many large-scale educational assessment studies such as PISA. In the present paper, optimal test designs are investigated for estimating persons’ abilities with the GPCM for calibrated tests when item parameters are known from previous studies. We find that local optimality may be achieved by assigning non-zero probability only to the first and last categories independently of a person's ability. That is, when using such a design, the GPCM reduces to the dichotomous two-parameter logistic (2PL) model. Since locally optimal designs require the true ability to be known, we consider alternative Bayesian design criteria using weight distributions over the ability parameter space. For symmetric weight distributions, we derive necessary conditions for the optimal one-point design of two response categories to be Bayes optimal. Furthermore, we discuss examples of common symmetric weight distributions and investigate under what circumstances the necessary conditions are also sufficient. Since the 2PL model is a special case of the GPCM, all of these results hold for the 2PL model as well.  相似文献   

2.
A number of models for categorical item response data have been proposed in recent years. The models appear to be quite different. However, they may usefully be organized as members of only three distinct classes, within which the models are distinguished only by assumptions and constraints on their parameters. “Difference models” are appropriate for ordered responses, “divide-by-total” models may be used for either ordered or nominal responses, and “left-side added” models are used for multiple-choice responses with guessing. The details of the taxonomy and the models are described in this paper. The present study was supported in part by two postdoctoral fellowships awarded to Lynne Steinberg: an Educational Testing Service Postdoctoral Fellowship at ETS, Princeton, NJ and an NIMH Individual National Research Service Award at Stanford University, Stanford, CA. Helpful comments by the editor and three anonymous reviewers are gratefully acknowledged.  相似文献   

3.
Various different item response theory (IRT) models can be used in educational and psychological measurement to analyze test data. One of the major drawbacks of these models is that efficient parameter estimation can only be achieved with very large data sets. Therefore, it is often worthwhile to search for designs of the test data that in some way will optimize the parameter estimates. The results from the statistical theory on optimal design can be applied for efficient estimation of the parameters.A major problem in finding an optimal design for IRT models is that the designs are only optimal for a given set of parameters, that is, they are locally optimal. Locally optimal designs can be constructed with a sequential design procedure. In this paper minimax designs are proposed for IRT models to overcome the problem of local optimality. Minimax designs are compared to sequentially constructed designs for the two parameter logistic model and the results show that minimax design can be nearly as efficient as sequentially constructed designs.  相似文献   

4.
A Bayesian procedure is developed for the estimation of parameters in the two-parameter logistic item response model. Joint modal estimates of the parameters are obtained and procedures for the specification of prior information are described. Through simulation studies it is shown that Bayesian estimates of the parameters are superior to maximum likelihood estimates in the sense that they are (a) more meaningful since they do not drift out of range, and (b) more accurate in that they result in smaller mean squared differences between estimates and true values.The research reported here was performed pursuant to Grant No. N0014-79-C-0039 with the Office of Naval Research.  相似文献   

5.
A monotone relationship between a true score (τ) and a latent trait level (θ) has been a key assumption for many psychometric applications. The monotonicity property in dichotomous response models is evident as a result of a transformation via a test characteristic curve. Monotonicity in polytomous models, in contrast, is not immediately obvious because item response functions are determined by a set of response category curves, which are conceivably non-monotonic in θ. The purpose of the present note is to demonstrate strict monotonicity in ordered polytomous item response models. Five models that are widely used in operational assessments are considered for proof: the generalized partial credit model (Muraki, 1992, Applied Psychological Measurement, 16, 159), the nominal model (Bock, 1972, Psychometrika, 37, 29), the partial credit model (Masters, 1982, Psychometrika, 47, 147), the rating scale model (Andrich, 1978, Psychometrika, 43, 561), and the graded response model (Samejima, 1972, A general model for free-response data (Psychometric Monograph no. 18). Psychometric Society, Richmond). The study asserts that the item response functions in these models strictly increase in θ and thus there exists strict monotonicity between τ and θ under certain specified conditions. This conclusion validates the practice of customarily using τ in place of θ in applied settings and provides theoretical grounds for one-to-one transformations between the two scales.  相似文献   

6.
Wendy M. Yen 《Psychometrika》1985,50(4):399-410
When the three-parameter logistic model is applied to tests covering a broad range of difficulty, there frequently is an increase in mean item discrimination and a decrease in variance of item difficulties and traits as the tests become more difficult. To examine the hypothesis that this unexpected scale shrinkage effect occurs because the items increase in complexity as they increase in difficulty, an approximate relationship is derived between the unidimensional model used in data analysis and a multidimensional model hypothesized to be generating the item responses. Scale shrinkage is successfully predicted for several sets of simulated data.The author is grateful to Robert Mislevy for kindly providing a copy of his computer program, RESOLVE.  相似文献   

7.
An IRT model based on the Rasch model is proposed for composite tasks, that is, tasks that are decomposed into subtasks of different kinds. There is one subtask for each component that is discerned in the composite tasks. A component is a generic kind of subtask of which the subtasks resulting from the decomposition are specific instantiations with respect to the particular composite tasks under study. The proposed model constrains the difficulties of the composite tasks to be linear combinations of the difficulties of the corresponding subtask items, which are estimated together with the weights used in the linear combinations, one weight for each kind of subtask. Although the model does not belong to the exponential family, its parameters can be estimated using conditional maximum likelihood estimation. The approach is demonstrated with an application to spelling tasks. We thank Eric Maris for his helpful comments.  相似文献   

8.
The four-parameter logistic (4PL) item response model, which includes an upper asymptote for the correct response probability, has drawn increasing interest due to its suitability for many practical scenarios. This paper proposes a new Gibbs sampling algorithm for estimation of the multidimensional 4PL model based on an efficient data augmentation scheme (DAGS). With the introduction of three continuous latent variables, the full conditional distributions are tractable, allowing easy implementation of a Gibbs sampler. Simulation studies are conducted to evaluate the proposed method and several popular alternatives. An empirical data set was analysed using the 4PL model to show its improved performance over the three-parameter and two-parameter logistic models. The proposed estimation scheme is easily accessible to practitioners through the open-source IRTlogit package.  相似文献   

9.
10.
In tailored testing, it is important to determine the optimal difficulty of the next item to present to the examinee. This paper shows that the difference that maximizes information for the three-parameter normal ogive response model is approximately 1.7 times the optimal differenceb for the three-parameter logistic model. Under the normal model, calculation of the optimal difficulty for minimizing the Bayes risk is equivalent to maximizing an associated information function.The views expressed herein, are those of the author and do not necessarily reflect those of the Department of the Navy.  相似文献   

11.
This paper concerns items that consist of several item steps to be responded to sequentially. The item scoreX is defined as the number of correct responses until the first failure. Samejima's graded response model states that each steph=1,...,m is characterized by a parameterb h , and, for a subject with ability, Pr(Xh; )=F(–b h ). Tutz's general sequential model associates with each step a parameterdh, and it states that Pr(Xh;)= r =1h G(d r ). Tutz's (1991, 1997) conjectures that the models are equivalent if and only ifF(x)=G(x) is an extreme value distribution. This paper presents a proof for this conjecture.  相似文献   

12.
A test theory using only ordinal assumptions is presented. It is based on the idea that the test items are a sample from a universe of items. The sum across items of the ordinal relations for a pair of persons on the universe items is analogous to a true score. Using concepts from ordinal multiple regression, it is possible to estimate the tau correlations of test items with the universe order from the taus among the test items. These in turn permit the estimation of the tau of total score with the universe. It is also possible to estimate the odds that the direction of a given observed score difference is the same as that of the true score difference. The estimates of the correlations between items and universe and between total score and universe are found to agree well with the actual values in both real and artificial data.Part of this paper was presented at the June, 1989, Meeting of the Psychometric Society. The authors wish to thank several reviewers for their suggestions. This research was mainly done while the second author was a University Fellow at the University of Southern California.  相似文献   

13.
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.  相似文献   

14.
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between the latent variables and dichotomous observed variables, which may be responses to tests or questionnaires. It will be shown that the multilevel model with measurement error in the observed predictor variables can be estimated in a Bayesian framework using Gibbs sampling. In this article, handling measurement error via the normal ogive model is compared with alternative approaches using the classical true score model. Examples using real data are given.This paper is part of the dissertation by Fox (2001) that won the 2002 Psychometric Society Dissertation Award.  相似文献   

15.
Some standard errors in item response theory   总被引:2,自引:0,他引:2  
The mathematics required to calculate the asymptotic standard errors of the parameters of three commonly used logistic item response models is described and used to generate values for some common situations. It is shown that the maximum likelihood estimation of a lower asymptote can wreak havoc with the accuracy of estimation of a location parameter, indicating that if one needs to have accurate estimates of location parameters (say for purposes of test linking/equating or computerized adaptive testing) the sample sizes required for acceptable accuracy may be unattainable in most applications. It is suggested that other estimation methods be used if the three parameter model is applied in these situations.The research reported here was supported, in part, by contract #F41689-81-6-0012 from the Air Force Human Resources Laboratory to McFann-Gray & Associates, Benjamin A. Fairbank, Jr., Principal Investigator. Further support of Wainer's effort was supplied by the Educational Testing Service, Program Statistics Research Project.  相似文献   

16.
Information functions are used to find the optimum ability levels and maximum contributions to information for estimating item parameters in three commonly used logistic item response models. For the three and two parameter logistic models, examinees who contribute maximally to the estimation of item difficulty contribute little to the estimation of item discrimination. This suggests that in applications that depend heavily upon the veracity of individual item parameter estimates (e.g. adaptive testing or text construction), better item calibration results may be obtained (for fixed sample sizes) from examinee calibration samples in which ability is widely dispersed.This work was supported by Contract No. N00014-83-C-0457, project designation NR 150-520, from Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval Research and Educational Testing Service through the Program Research Planning Council. Reproduction in whole or in part is permitted for any purpose of the United States Government. The author wishes to acknowledge the invaluable assistance of Maxine B. Kingston in carrying out this study, and to thank Charles Lewis for his many insightful comments on earlier drafts of this paper.  相似文献   

17.
Multidimensional item response theory (MIRT) models for response style (e.g., Bolt, Lu, & Kim, 2014, Psychological Methods, 19, 528; Falk & Cai, 2016, Psychological Methods, 21, 328) provide flexibility in accommodating various response styles, but often present difficulty in isolating the effects of response style(s) from the intended substantive trait(s). In the presence of such measurement limitations, we consider several ways in which MIRT models are nevertheless useful in lending insight into how response styles may interfere with measurement for a given test instrument. Such a study can also inform whether alternative design considerations (e.g., anchoring vignettes, self-report items of heterogeneous content) that seek to control for response style effects may be helpful. We illustrate several aspects of an MIRT approach using real and simulated analyses.  相似文献   

18.
Multidimensional item response theory (MIRT) is widely used in assessment and evaluation of educational and psychological tests. It models the individual response patterns by specifying a functional relationship between individuals' multiple latent traits and their responses to test items. One major challenge in parameter estimation in MIRT is that the likelihood involves intractable multidimensional integrals due to the latent variable structure. Various methods have been proposed that involve either direct numerical approximations to the integrals or Monte Carlo simulations. However, these methods are known to be computationally demanding in high dimensions and rely on sampling data points from a posterior distribution. We propose a new Gaussian variational expectation--maximization (GVEM) algorithm which adopts variational inference to approximate the intractable marginal likelihood by a computationally feasible lower bound. In addition, the proposed algorithm can be applied to assess the dimensionality of the latent traits in an exploratory analysis. Simulation studies are conducted to demonstrate the computational efficiency and estimation precision of the new GVEM algorithm compared to the popular alternative Metropolis–Hastings Robbins–Monro algorithm. In addition, theoretical results are presented to establish the consistency of the estimator from the new GVEM algorithm.  相似文献   

19.
解释性项目反应理论模型(Explanatory Item Response Theory Models, EIRTM)是指基于广义线性混合模型和非线性混合模型构建的项目反应理论(Item Response Theory, IRT)模型。EIRTM能在IRT模型的基础上直接加入预测变量, 从而解决各类测量问题。首先介绍EIRTM的相关概念和参数估计方法, 然后展示如何使用EIRTM处理题目位置效应、测验模式效应、题目功能差异、局部被试依赖和局部题目依赖, 接着提供实例对EIRTM的使用进行说明, 最后对EIRTM的不足之处和应用前景进行讨论。  相似文献   

20.
The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号