For each Rasch (Masters) partial credit item, there exists a set of independent Rasch binary and indecomposable trinary items for which the sum of the scores and the partial credit score have identical probability density functions. If each indecomposable trinary item is further expressed as the sum of two binary items, then the binary items are positively dependent and cannot be both of the Rasch type. This paper was written while the author was working with Steve Ferrara and Hillary Michaels on some technical aspects of the Maryland School Performance Assessment Program. The author had been puzzled by the fact that most MSPAP assessment items have three or less score categories. With a psychometric justification now being apparent, this paper is dedicated to both of them.  相似文献   

This paper discusses the application of a class of Rasch models to situations where test items are grouped into subsets and the common attributes of items within these subsets brings into question the usual assumption of conditional independence. The models are all expressed as particular cases of the random coefficients multinomial logit model developed by Adams and Wilson. This formulation allows a very flexible approach to the specification of alternative models, and makes model testing particularly straightforward. The use of the models is illustrated using item bundles constructed in the framework of the SOLO taxonomy of Biggs and Collis.The work of both authors was supported by fellowships from the National Academy of Education Spencer Fellowship.  相似文献   

Extensions of the partial credit model   总被引:1,自引:0,他引:1  
The partial credit model, developed by Masters (1982), is a unidimensional latent trait model for responses scored in two or more ordered categories. In the present paper some extensions of the model are presented. First, a marginal maximum likelihood estimation procedure is developed which allows for incomplete data and linear restrictions on both the item and the population parameters. Secondly, two statistical tests for evaluating model fit are presented: the former test has power against violation of the assumption about the ability distribution, the latter test offers the possibility of identifying specific items that do not fit the model.The authors are indepted to professor Wim van der Linden and Huub Verstralen for their helpful comments.  相似文献   

The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments.  相似文献   

Analysing ordinal data is becoming increasingly important in psychology, especially in the context of item response theory. The generalized partial credit model (GPCM) is probably the most widely used ordinal model and has found application in many large-scale educational assessment studies such as PISA. In the present paper, optimal test designs are investigated for estimating persons’ abilities with the GPCM for calibrated tests when item parameters are known from previous studies. We find that local optimality may be achieved by assigning non-zero probability only to the first and last categories independently of a person's ability. That is, when using such a design, the GPCM reduces to the dichotomous two-parameter logistic (2PL) model. Since locally optimal designs require the true ability to be known, we consider alternative Bayesian design criteria using weight distributions over the ability parameter space. For symmetric weight distributions, we derive necessary conditions for the optimal one-point design of two response categories to be Bayes optimal. Furthermore, we discuss examples of common symmetric weight distributions and investigate under what circumstances the necessary conditions are also sufficient. Since the 2PL model is a special case of the GPCM, all of these results hold for the 2PL model as well.  相似文献   

The partial credit model is considered under the assumption of a certain linear decomposition of the item × category parameters ih into basic parameters j. This model is referred to as the linear partial credit model. A conditional maximum likelihood algorithm for estimation of the j is presented, based on (a) recurrences for the combinatorial functions involved, and (b) using a quasi-Newton approach, the so-called Broyden-Fletcher-Goldfarb-Shanno (BFGS) method; (a) guarantees numerically stable results, (b) avoids the direct computation of the Hesse matrix, yet produces a sequence of certain positive definite matricesB k ,k=1, 2, ..., converging to the asymptotic variance-covariance matrix of the . The practicality of these numerical methods is demonstrated both by means of simulations and of an empirical application to the measurement of treatment effects in patients with psychosomatic disorders.The authors thank one anonymous reviewer for his constructive comments. Moreover, they thankfully acknowledge financial support by the Österreichische Nationalbank (Austrian National Bank) under Grant No. 3720.  相似文献   

A loglinear IRT model is proposed that relates polytomously scored item responses to a multidimensional latent space. The analyst may specify a response function for each response, indicating which latent abilities are necessary to arrive at that response. Each item may have a different number of response categories, so that free response items are more easily analyzed. Conditional maximum likelihood estimates are derived and the models may be tested generally or against alternative loglinear IRT models.Hank Kelderman is currently affiliated with Vrije Universiteit, Amsterdam.We thank Linda Vodegel-Matzen of the Division of Developmental Psychology of the University of Amsterdam for making available the data used in the example in this article.  相似文献   

主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,介绍了国外相关的典型应用,并且讨论了该模型的应用条件。  相似文献   

采用项目反应理论(IRT)的多侧面Rasch模型(MFRM),分析评价中心技术中无领导小组讨论(LGD)的测评结果,探讨被试能力水平、评委评分宽严度、评分内部一致性、维度难度和评定等级等问题,进而讨论各种偏差。通过 MFRM 分析人事测评结果,可深入了解被试能力的真实差异、甑别维度难度、探查测评误差源,从而完善测评试题编制、评估或诊断评委合格性、提高测评维度与测评目的匹配性,为拓展项目反应理论在人事测评中的应用提供独特视角。  相似文献   

HSK主观考试评分的Rasch实验分析   总被引:1,自引:0,他引:1  
主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,设计了基于该模型的HSK主观考试评分质量控制应用框架,利用HSK作文评分数据进行了实验验证。  相似文献   

This paper concerns items that consist of several item steps to be responded to sequentially. The item scoreX is defined as the number of correct responses until the first failure. Samejima's graded response model states that each steph=1,...,m is characterized by a parameterb h , and, for a subject with ability, Pr(Xh; )=F(–b h ). Tutz's general sequential model associates with each step a parameterdh, and it states that Pr(Xh;)= r =1h G(d r ). Tutz's (1991, 1997) conjectures that the models are equivalent if and only ifF(x)=G(x) is an extreme value distribution. This paper presents a proof for this conjecture.  相似文献   

A new model, called acceleration model, is proposed in the framework of the heterogenous case of the graded response model, based on processing functions defined for a finite or enumerable number of steps. The model is expected to be useful in cognitive assessment, as well as in more traditional areas of application of latent trait models. Criteria for evaluating models are proposed, and soundness and robustness of the acceleration model are discussed. Graded response models based on individual choice behavior are also discussed, and criticisms on model selection in terms of fitnesses of models to the data are also given.This research was supported by the Office of Naval Research (N00014-90-J-1456).  相似文献   

A necessary and sufficient condition is given in this paper for the existence and uniqueness of the maximum likelihood (the so-called joint maximum likelihood) estimate of the parameters of the Partial Credit Model. This condition is stated in terms of a structural property of the pattern of the data matrix that can be easily verified on the basis of a simple iterative procedure. The result is proved by using an argument of Haberman (1977). The author wishes to thank the Editor and the anonymous reviewers for their comments that helped to substantially improve the final version of this paper. This research was supported in part by a MURST grant (ex 60%).  相似文献   

With the purpose of increasing the knowledge of the psychometric properties of the 70-item Danish Word Association Test, data from three samples of non-patients and psychiatric patients (N = 326) were used to provide two measures of affectivity of the stimulus words, response heterogeneity and reaction time prolongation. It was possible to fit an item response theory one-parameter measurement (Rasch) model to the number of reaction time prolongations (> or =3 seconds) for 54 of the stimulus words. Correlation between Rasch-model item parameters and response heterogeneity was high (r = 0.86), while no correlation was found between either of these measures and frequency of the stimulus words in the Danish language. Both measures of stimulus affectivity supported a theoretically based classification of stimulus words as emotional or neutral. Response heterogeneity measures and Rasch measurement item and person parameters for reaction time prolongations are provided.  相似文献   

We conducted two experimental studies with between-subjects and within-subjects designs to investigate the item response process for personality measures administered in high- versus low-stakes situations. Apart from assessing measurement validity of the item response process, we examined predictive validity; that is, whether or not different response models entail differential selection outcomes. We found that ideal point response models fit slightly better than dominance response models across high- versus low-stakes situations in both studies. Additionally, fitting ideal point models to the data led to fewer items displaying differential item functioning compared to fitting dominance models. We also identified several items that functioned as intermediate items in both the faking and honest conditions when ideal point models were fitted, suggesting that ideal point model is “theoretically” more suitable across these contexts for personality inventories. However, the use of different response models (dominance vs. ideal point) did not have any substantial impact on the validity of personality measures in high-stakes situations, or the effectiveness of selection decisions such as mean performance or percent of fakers selected. These findings are significant in that although prior research supports the importance and use of ideal point models for measuring personality, we find that in the case of personality faking, though ideal point models seem to have slightly better measurement validity, the use of dominance models may be adequate with no loss to predictive validity.  相似文献   

