共查询到20条相似文献,搜索用时 15 毫秒
1.
A number of models for categorical item response data have been proposed in recent years. The models appear to be quite different.
However, they may usefully be organized as members of only three distinct classes, within which the models are distinguished
only by assumptions and constraints on their parameters. “Difference models” are appropriate for ordered responses, “divide-by-total”
models may be used for either ordered or nominal responses, and “left-side added” models are used for multiple-choice responses
with guessing. The details of the taxonomy and the models are described in this paper.
The present study was supported in part by two postdoctoral fellowships awarded to Lynne Steinberg: an Educational Testing
Service Postdoctoral Fellowship at ETS, Princeton, NJ and an NIMH Individual National Research Service Award at Stanford University,
Stanford, CA. Helpful comments by the editor and three anonymous reviewers are gratefully acknowledged. 相似文献
2.
Various different item response theory (IRT) models can be used in educational and psychological measurement to analyze test data. One of the major drawbacks of these models is that efficient parameter estimation can only be achieved with very large data sets. Therefore, it is often worthwhile to search for designs of the test data that in some way will optimize the parameter estimates. The results from the statistical theory on optimal design can be applied for efficient estimation of the parameters.A major problem in finding an optimal design for IRT models is that the designs are only optimal for a given set of parameters, that is, they are locally optimal. Locally optimal designs can be constructed with a sequential design procedure. In this paper minimax designs are proposed for IRT models to overcome the problem of local optimality. Minimax designs are compared to sequentially constructed designs for the two parameter logistic model and the results show that minimax design can be nearly as efficient as sequentially constructed designs. 相似文献
3.
A Bayesian procedure is developed for the estimation of parameters in the two-parameter logistic item response model. Joint modal estimates of the parameters are obtained and procedures for the specification of prior information are described. Through simulation studies it is shown that Bayesian estimates of the parameters are superior to maximum likelihood estimates in the sense that they are (a) more meaningful since they do not drift out of range, and (b) more accurate in that they result in smaller mean squared differences between estimates and true values.The research reported here was performed pursuant to Grant No. N0014-79-C-0039 with the Office of Naval Research. 相似文献
4.
Wendy M. Yen 《Psychometrika》1985,50(4):399-410
When the three-parameter logistic model is applied to tests covering a broad range of difficulty, there frequently is an increase in mean item discrimination and a decrease in variance of item difficulties and traits as the tests become more difficult. To examine the hypothesis that this unexpected scale shrinkage effect occurs because the items increase in complexity as they increase in difficulty, an approximate relationship is derived between the unidimensional model used in data analysis and a multidimensional model hypothesized to be generating the item responses. Scale shrinkage is successfully predicted for several sets of simulated data.The author is grateful to Robert Mislevy for kindly providing a copy of his computer program, RESOLVE. 相似文献
5.
An IRT model based on the Rasch model is proposed for composite tasks, that is, tasks that are decomposed into subtasks of
different kinds. There is one subtask for each component that is discerned in the composite tasks. A component is a generic
kind of subtask of which the subtasks resulting from the decomposition are specific instantiations with respect to the particular
composite tasks under study. The proposed model constrains the difficulties of the composite tasks to be linear combinations
of the difficulties of the corresponding subtask items, which are estimated together with the weights used in the linear combinations,
one weight for each kind of subtask. Although the model does not belong to the exponential family, its parameters can be estimated
using conditional maximum likelihood estimation. The approach is demonstrated with an application to spelling tasks.
We thank Eric Maris for his helpful comments. 相似文献
6.
John H. Wolfe 《Psychometrika》1981,46(4):461-464
In tailored testing, it is important to determine the optimal difficulty of the next item to present to the examinee. This paper shows that the difference that maximizes information for the three-parameter normal ogive response model is approximately 1.7 times the optimal difference –b for the three-parameter logistic model. Under the normal model, calculation of the optimal difficulty for minimizing the Bayes risk is equivalent to maximizing an associated information function.The views expressed herein, are those of the author and do not necessarily reflect those of the Department of the Navy. 相似文献
7.
This paper concerns items that consist of several item steps to be responded to sequentially. The item scoreX is defined as the number of correct responses until the first failure. Samejima's graded response model states that each steph=1,...,m is characterized by a parameterb
h
, and, for a subject with ability, Pr(Xh; )=F(–b
h
). Tutz's general sequential model associates with each step a parameterdh, and it states that Pr(Xh;)=
r
=1h
G(–d
r
). Tutz's (1991, 1997) conjectures that the models are equivalent if and only ifF(x)=G(x) is an extreme value distribution. This paper presents a proof for this conjecture. 相似文献
8.
A test theory using only ordinal assumptions is presented. It is based on the idea that the test items are a sample from a universe of items. The sum across items of the ordinal relations for a pair of persons on the universe items is analogous to a true score. Using concepts from ordinal multiple regression, it is possible to estimate the tau correlations of test items with the universe order from the taus among the test items. These in turn permit the estimation of the tau of total score with the universe. It is also possible to estimate the odds that the direction of a given observed score difference is the same as that of the true score difference. The estimates of the correlations between items and universe and between total score and universe are found to agree well with the actual values in both real and artificial data.Part of this paper was presented at the June, 1989, Meeting of the Psychometric Society. The authors wish to thank several reviewers for their suggestions. This research was mainly done while the second author was a University Fellow at the University of Southern California. 相似文献
9.
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between the latent variables and dichotomous observed variables, which may be responses to tests or questionnaires. It will be shown that the multilevel model with measurement error in the observed predictor variables can be estimated in a Bayesian framework using Gibbs sampling. In this article, handling measurement error via the normal ogive model is compared with alternative approaches using the classical true score model. Examples using real data are given.This paper is part of the dissertation by Fox (2001) that won the 2002 Psychometric Society Dissertation Award. 相似文献
10.
Some standard errors in item response theory 总被引:2,自引:0,他引:2
The mathematics required to calculate the asymptotic standard errors of the parameters of three commonly used logistic item response models is described and used to generate values for some common situations. It is shown that the maximum likelihood estimation of a lower asymptote can wreak havoc with the accuracy of estimation of a location parameter, indicating that if one needs to have accurate estimates of location parameters (say for purposes of test linking/equating or computerized adaptive testing) the sample sizes required for acceptable accuracy may be unattainable in most applications. It is suggested that other estimation methods be used if the three parameter model is applied in these situations.The research reported here was supported, in part, by contract #F41689-81-6-0012 from the Air Force Human Resources Laboratory to McFann-Gray & Associates, Benjamin A. Fairbank, Jr., Principal Investigator. Further support of Wainer's effort was supplied by the Educational Testing Service, Program Statistics Research Project. 相似文献
11.
Martha L. Stocking 《Psychometrika》1990,55(3):461-475
Information functions are used to find the optimum ability levels and maximum contributions to information for estimating item parameters in three commonly used logistic item response models. For the three and two parameter logistic models, examinees who contribute maximally to the estimation of item difficulty contribute little to the estimation of item discrimination. This suggests that in applications that depend heavily upon the veracity of individual item parameter estimates (e.g. adaptive testing or text construction), better item calibration results may be obtained (for fixed sample sizes) from examinee calibration samples in which ability is widely dispersed.This work was supported by Contract No. N00014-83-C-0457, project designation NR 150-520, from Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval Research and Educational Testing Service through the Program Research Planning Council. Reproduction in whole or in part is permitted for any purpose of the United States Government. The author wishes to acknowledge the invaluable assistance of Maxine B. Kingston in carrying out this study, and to thank Charles Lewis for his many insightful comments on earlier drafts of this paper. 相似文献
12.
The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments. 相似文献
13.
Caution indices based on item response theory 总被引:2,自引:0,他引:2
Kikumi K. Tatsuoka 《Psychometrika》1984,49(1):95-110
A new family of indices was introduced earlier as a link between two approaches: One based on item response theory and the other on sample statistics. In this study, the statistical properties of these indices are investigated and then the relationships to Guttman Scales, and to item and person response curves are discussed. Further, these indices are standardized, and an example of their potential usefulness for diagnosing students' misconceptions is shown.This research was sponsored by the Personnel and Training Research Program, Psychological Sciences Division, Office of Naval Research, under contract No. N00014-82-K-0604. 相似文献
14.
Fumiko Samejima 《Psychometrika》2000,65(3):319-335
The paper addresses and discusses whether the tradition of accepting point-symmetric item characteristic curves is justified by uncovering the inconsistent relationship between the difficulties of items and the order of maximum likelihood estimates of ability. This inconsistency is intrinsic in models that provide point-symmetric item characteristic curves, and in this paper focus is put on the normal ogive model for observation. It is also questioned if in the logistic model the sufficient statistic has forfeited the rationale that is appropriate to the psychological reality. It is observed that the logistic model can be interpreted as the case in which the inconsistency in ordering the maximum likelihood estimates is degenerated.The paper proposes a family of models, called the logistic positive exponent family, which provides asymmetric item chacteristic curves. A model in this family has a consistent principle in ordering the maximum likelihood estimates of ability. The family is divided into two subsets each of which has its own principle, and includes the logistic model as a transition from one principle to the other. Rationale and some illustrative examples are given. 相似文献
15.
A general latent trait model for response processes 总被引:1,自引:0,他引:1
Susan Embretson 《Psychometrika》1984,49(2):175-186
The purpose of the current paper is to propose a general multicomponent latent trait model (GLTM) for response processes. The proposed model combines the linear logistic latent trait (LLTM) with the multicomponent latent trait model (MLTM). As with both LLTM and MLTM, the general multicomponent latent trait model can be used to (1) test hypotheses about the theoretical variables that underlie response difficulty and (2) estimate parameters that describe test items by basic substantive properties. However, GLTM contains both component outcomes and complexity factors in a single model and may be applied to data that neither LLTM nor MLTM can handle. Joint maximum likelihood estimators are presented for the parameters of GLTM and an application to cognitive test items is described.This research was partially supported by the National Institute of Education grant number NIE-6-7-0156 to Susan Embretson (Whitely), principal investigator. However the optinions expressed herein do not necessarily reflect the position or policy of the National Institute of Education, and no official endorsement by the National Institute of Education should be inferred. 相似文献
16.
Replenishing item pools for on-line ability testing requires innovative and efficient data collection designs. By generating localD-optimal designs for selecting individual examinees, and consistently estimating item parameters in the presence of error in the design points, sequential procedures are efficient for on-line item calibration. The estimating error in the on-line ability values is accounted for with an item parameter estimate studied by Stefanski and Carroll. LocallyD-optimaln-point designs are derived using the branch-and-bound algorithm of Welch. In simulations, the overall sequential designs appear to be considerably more efficient than random seeding of items.This report was prepared under the Navy Manpower, Personnel, and Training R&D Program of the Office of the Chief of Naval Research under Contract N00014-87-0696. The authors wish to acknowledge the valuable advice and consultation given by Ronald Armstrong, Charles Davis, Bradford Sympson, Zhaobo Wang, Ing-Long Wu and three anonymous reviewers. 相似文献
17.
In this paper we propose two interpretations for the discrimination parameter in the two-parameter logistic model (2PLM).
The interpretations are based on the relation between the 2PLM and two stochastic models. In the first interpretation, the
2PLM is linked to a diffusion model so that the probability of absorption equals the 2PLM. The discrimination parameter is
the distance between the two absorbing boundaries and therefore the amount of information that has to be collected before
a response to an item can be given. For the second interpretation, the 2PLM is connected to a specific type of race model.
In the race model, the discrimination parameter is inversely related to the dependency of the information used in the decision
process. Extended versions of both models with person-to-person variability in the difficulty parameter are considered. When
fitted to a data set, it is shown that a generalization of the race model that allows for dependency between choices and response
times (RTs) is the best-fitting model. 相似文献
18.
Wim J. van der Linden 《Psychometrika》1998,63(2):201-216
Owen (1975) proposed an approximate empirical Bayes procedure for item selection in computerized adaptive testing (CAT). The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational complexity involved in a fully Bayesian approach but is no longer necessary given the computational power currently available for adaptive testing. This paper suggests several item selection criteria for adaptive testing which are all based on the use of the true posterior. Some of the statistical properties of the ability estimator produced by these criteria are discussed and empirically characterized.Portions of this paper were presented at the 60th annual meeting of the Psychometric Society, Minneapolis, Minnesota, June, 1995. The author is indebted to Wim M. M. Tielen for his computational support. 相似文献
19.
A model is presented for item responses when different subjects employ different strategies, but only responses, not choice of strategy, can be observed. Using substantive theory to differentiate the likelihoods of response vectors under a fixed set of strategies, we model response probabilities in terms of item parameters for each strategy, proportions of subjects employing each strategy, and distributions of subject proficiency within strategies. The probabilities that an individual subject employed the various strategies can then be obtained, along with a conditional estimate of proficiency under each. A conceptual example discusses response strategies for spatial rotation tasks, and a numerican example resolves a population of subjects into subpopulations of valid responders and random guessers.The first author's work was supported by Contract No. N00014-85-K-0683, project designation NR 150-539, from the Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval Research. We are grateful to Murray Aitkin, Isaac Bejar, Neil Dorans, Frederiksen, and Marklyn Wingersky for their comments and suggestions, and to Alison Gooding, Maxine Kingston, Donna Lembeck, Joling Liang, and Kentaro Yamamoto for their assistance with Example 2. 相似文献
20.
Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken (1971) and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four classical lower bounds to reliability. Finally, recommendations are given concerning the use of these estimation methods.The authors are grateful for constructive comments from the reviewers and from Charles Lewis. 相似文献