期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Limited‐information goodness‐of‐fit testing of item response theory models for sparse 2P tables

《The British journal of mathematical and statistical psychology》2006,59(1):173-194

Bartholomew and Leung proposed a limited‐information goodness‐of‐fit test statistic (Y) for models fitted to sparse 2^P contingency tables. The null distribution of Y was approximated using a chi‐squared distribution by matching moments. The moments were derived under the assumption that the model parameters were known in advance and it was conjectured that the approximation would also be appropriate when the parameters were to be estimated. Using maximum likelihood estimation of the two‐parameter logistic item response theory model, we show that the effect of parameter estimation on the distribution of Y is too large to be ignored. Consequently, we derive the asymptotic moments of Y for maximum likelihood estimation. We show using a simulation study that when the null distribution of Y is approximated using moments that take into account the effect of estimation, Y becomes a very useful statistic to assess the overall goodness of fit of models fitted to sparse 2^P tables. 相似文献

2.

Random Item IRT Models

Paul De Boeck 《Psychometrika》2008,73(4):533-559

It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF. 相似文献

3.

Empirical comparison of item parameters based on the logistic and normal functions

Frank B. Baker 《Psychometrika》1961,26(2):239-246

Maximum likelihood estimates of item parameters of a scholastic aptitude test were computed using the normal and logistic models. The goodness of fit of ogives specified by the pairs of item parameters to the observed data was determined for all items. While negligible differences in the limen values were found, differences in item discrimination indices indicated that interpretation of these indices requires separate frames of reference. The empirical results showed the logistic model to be a useful alternative to the normal model in item analysis. 相似文献

4.

Rasch analysis and item reduction of the hypomanic personality scale

David M. Meads Richard P. Bentall 《Personality and individual differences》2008,44(8):1772-1783

The aim of the current study was to reduce the number of items in the 48-item hypomanic personality scale (HPS) and determine whether a unidimensional scale of the hypomanic trait could be derived. Previously collected HPS data from University students (n = 318) were applied to the Rasch model (one-parameter item response theory). Overall scale and individual item fit statistics were used to judge fit to the model and item maps employed to determine coverage of the trait. Cronbach’s Alpha and correlations with other questionnaires pre- and post-item reduction were evaluated. Rasch analysis indicated that the original HPS was not unidimensional, had significant redundancy and differential item functioning by age and gender. An iterative process of item reduction produced a 20-item HPS (HPS-20) that retained the concepts of the original HPS and had excellent fit to the Rasch model (χ² p = 0.27). Unidimensionality of the HPS-20 was confirmed. The traditional psychometric properties of the HPS-20 and coverage of the underlying hypomanic construct were similar to the original. It was possible to derive a unidimensional measure of the hypomanic trait. Further use of the HPS-20 is encouraged as it may increase understanding of the risk factors for affective disorders. 相似文献

5.

Logistic positive exponent family of models: Virtue of asymmetric item characteristic curves

Fumiko Samejima 《Psychometrika》2000,65(3):319-335

The paper addresses and discusses whether the tradition of accepting point-symmetric item characteristic curves is justified by uncovering the inconsistent relationship between the difficulties of items and the order of maximum likelihood estimates of ability. This inconsistency is intrinsic in models that provide point-symmetric item characteristic curves, and in this paper focus is put on the normal ogive model for observation. It is also questioned if in the logistic model the sufficient statistic has forfeited the rationale that is appropriate to the psychological reality. It is observed that the logistic model can be interpreted as the case in which the inconsistency in ordering the maximum likelihood estimates is degenerated.The paper proposes a family of models, called the logistic positive exponent family, which provides asymmetric item chacteristic curves. A model in this family has a consistent principle in ordering the maximum likelihood estimates of ability. The family is divided into two subsets each of which has its own principle, and includes the logistic model as a transition from one principle to the other. Rationale and some illustrative examples are given. 相似文献

6.

Testing the Rasch model by means of the mixture fit index

《The British journal of mathematical and statistical psychology》2006,59(1):89-95

Rudas, Clogg, and Lindsay (RCL) proposed a new index of fit for contingency table analysis. Using the overparametrized two‐component mixture, where the first component with weight 1?w represents the model to be tested and the second component with weight w is unstructured, the mixture index of fit was defined to be the smallest w compatible with the saturated two‐component mixture. This index of fit, which is insensitive to sample size, is applied to the problem of assessing the fit of the Rasch model. In this application, use is made of the equivalence of the semi‐parametric version of the Rasch model to specifically restricted latent class models. Therefore, the Rasch model can be represented by the structured component of the RCL mixture, with this component itself consisting of two or more subcomponents corresponding to the classes, and the unstructured component capturing the discrepancies between the data and the model. An empirical example demonstrates the application of this approach. Based on four‐item data, the one‐ and two‐class unrestricted latent class models and the one‐ to three‐class models restricted according to the Rasch model are considered, with respect to both their chi‐squared statistics and their mixture fit indices. 相似文献

7.

Item bias detection using loglinear irt

Henk Kelderman 《Psychometrika》1989,54(4):681-697

A method is proposed for the detection of item bias with respect to observed or unobserved subgroups. The method uses quasi-loglinear models for the incomplete subgroup × test score × Item 1 × ... × itemk contingency table. If subgroup membership is unknown the models are Haberman's incomplete-latent-class models.The (conditional) Rasch model is formulated as a quasi-loglinear model. The parameters in this loglinear model, that correspond to the main effects of the item responses, are the conditional estimates of the parameters in the Rasch model. Item bias can then be tested by comparing the quasi-loglinear-Rasch model with models that contain parameters for the interaction of item responses and the subgroups.The author thanks Wim J. van der Linden and Gideon J. Mellenbergh for comments and suggestions and Frank Kok for empirical data. 相似文献

8.

The many null distributions of person fit indices 总被引：1，自引：0，他引：1

Ivo W. Molenaar Herbert Hoijtink 《Psychometrika》1990,55(1):75-106

This paper deals with the situation of an investigator who has collected the scores ofn persons to a set ofk dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fittingitems may be removed, this paper studies the alternative model in which a small minority ofpersons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting ofk symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called ability parameter, although our results are equally valid for Rasch scales measuring other attributes.As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.The authors express their gratitude to the reviewers and to many colleagues for comments on an earlier version. 相似文献

9.

Estimating the π* goodness of fit index for finite mixtures of item response models

Javier Revuelta 《The British journal of mathematical and statistical psychology》2008,61(1):93-113

Testing the fit of finite mixture models is a difficult task, since asymptotic results on the distribution of likelihood ratio statistics do not hold; for this reason, alternative statistics are needed. This paper applies the π* goodness of fit statistic to finite mixture item response models. The π* statistic assumes that the population is composed of two subpopulations – those that follow a parametric model and a residual group outside the model; π* is defined as the proportion of population in the residual group. The population was divided into two or more groups, or classes. Several groups followed an item response model and there was also a residual group. The paper presents maximum likelihood algorithms for estimating item parameters, the probabilities of the groups and π*. The paper also includes a simulation study on goodness of recovery for the two‐ and three‐parameter logistic models and an example with real data from a multiple choice test. 相似文献

10.

Evaluation of global testing procedures for item fit to the Rasch model

《The British journal of mathematical and statistical psychology》2003,56(1):127-143

Two types of global testing procedures for item fit to the Rasch model were evaluated using simulation studies. The first type incorporates three tests based on first‐order statistics: van den Wollenberg's Q₁ test, Glas's R₁ test, and Andersen's LR test. The second type incorporates three tests based on second‐order statistics: van den Wollenberg's Q₂ test, Glas's R₂ test, and a non‐parametric test proposed by Ponocny. The Type I error rates and the power against the violation of parallel item response curves, unidimensionality and local independence were analysed in relation to sample size and test length. In general, the outcomes indicate a satisfactory performance of all tests, except the Q₂ test which exhibits an inflated Type I error rate. Further, it was found that both types of tests have power against all three types of model violation. A possible explanation is the interdependencies among the assumptions underlying the model. 相似文献

11.

EDITOR'S NOTE

《International Journal of Testing》2013,13(1):1-2

In the framework of a linear logistic testing model, Mislevy, Sheehan, and Wingersky (1993) showed how to incorporate collateral information in estimating item parameters required for test equating. The purpose of the study was to explore the feasibility of applying this method to equate tests constructed for college entrance examination by comparing its results with those of the item response theory (IRT) true-score equating. Overall, the equating results based on collateral information are relatively comparable with those of IRT equating. In terms of R2's, the prediction equations for item characteristics are good to excellent. The significant levels of correlation coefficients between IRT calibrated b (difficulty level) and predicted b parameters range from around .01 to .05. The goodness of fit of true-score test characteristic curves (TCCs) based on collateral information to IRT true-score TCCs are excellent. Results of the study are discussed in light of factors that may affect the validity of using collateral information in test equating. 相似文献

12.

Some neglected problems in IRT

Gerhard H. Fischer 《Psychometrika》1995,60(4):459-487

The paper addresses three neglected questions from IRT. In section 1, the properties of the “measurement” of ability or trait parameters and item difficulty parameters in the Rasch model are discussed. It is shown that the solution to this problem is rather complex and depends both on general assumptions about properties of the item response functions and on assumptions about the available item universe. Section 2 deals with the measurement of individual change or “modifiability” based on a Rasch test. A conditional likelihood approach is presented that yields (a) an ML estimator of modifiability for given item parameters, (b) allows one to test hypotheses about change by means of a Clopper-Pearson confidence interval for the modifiability parameter, or (c) to estimate modifiability jointly with the item parameters. Uniqueness results for all three methods are also presented. In section 3, the Mantel-Haenszel method for detecting DIF is discussed under a novel perspective: What is the most general framework within which the Mantel-Haenszel method correctly detects DIF of a studied item? The answer is that this is a 2PL model where, however, all discrimination parameters are known and the studied item has the same discrimination in both populations. Since these requirements would hardly be satisfied in practical applications, the case of constant discrimination parameters, that is, the Rasch model, is the only realistic framework. A simple Pearsonx ² test for DIF of one studied item is proposed as an alternative to the Mantel-Haenszel test; moreover, this test is generalized to the case of two items simultaneously studied for DIF. 相似文献

13.

Cognitive Complexity in the Remote Association Test - Chinese Version

Su-Pin Hung Po-Sheng Huang Hsueh-Chih Chen 《创造力研究杂志》2016,28(4):442-449

The remote association test (RAT) has been applied in various fields; however, evidence of construct validity for the original version and subsequent extensions of the RAT remains limited. This study aimed to elucidate the dimensionality and the relationship between item features and item difficulties for the RAT—Chinese Version (RAT-C) using the Rasch model and the linear logistic test model (LLTM). The revised 30-item RAT-C was administered to 475 undergraduates (263 women and 212 men) in 8 universities in Taiwan. Item features (including types of associations among stimulus words, and frequency and concreteness of target words) were recoded. The analysis found that the RAT-C measured a single latent construct, with all 30 items conforming to the Rasch model’s expectation. Furthermore, according to the LLTM analysis, most item features predicted Rasch item difficulty, suggesting that these features can explain why some items were more difficult than others and can be used to create new items with known item difficulty to tailor the difficulty level for different groups of participants in the future. 相似文献

14.

Limited‐information goodness‐of‐fit testing of hierarchical item factor models

Li Cai Mark Hansen 《The British journal of mathematical and statistical psychology》2013,66(2):245-276

In applications of item response theory, assessment of model fit is a critical issue. Recently, limited‐information goodness‐of‐fit testing has received increased attention in the psychometrics literature. In contrast to full‐information test statistics such as Pearson’s X² or the likelihood ratio G², these limited‐information tests utilize lower‐order marginal tables rather than the full contingency table. A notable example is Maydeu‐Olivares and colleagues’M₂ family of statistics based on univariate and bivariate margins. When the contingency table is sparse, tests based on M₂ retain better Type I error rate control than the full‐information tests and can be more powerful. While in principle the M₂ statistic can be extended to test hierarchical multidimensional item factor models (e.g., bifactor and testlet models), the computation is non‐trivial. To obtain M₂, a researcher often has to obtain (many thousands of) marginal probabilities, derivatives, and weights. Each of these must be approximated with high‐dimensional numerical integration. We propose a dimension reduction method that can take advantage of the hierarchical factor structure so that the integrals can be approximated far more efficiently. We also propose a new test statistic that can be substantially better calibrated and more powerful than the original M₂ statistic when the test is long and the items are polytomous. We use simulations to demonstrate the performance of our new methods and illustrate their effectiveness with applications to real data. 相似文献

15.

Psychological Test Calibration Using the Rasch Model—Some Critical Suggestions on Traditional Approaches

《International Journal of Testing》2013,13(4):377-394

In this article, we emphasize that the Rasch model is not only very useful for psychological test calibration but is also necessary if the number of solved items is to be used as an examinee's score. Simplified proof that the Rasch model implies specific objective parameter comparisons is given. Consequently, a model check per se is possible. For data and item pools that fail to fit the Rasch model, various reasons are listed. For instance, the two-parameter logistic or three-parameter logistic models would probably be more suitable. Several suggestions are given for controlling the overall Type I risk, for including a power analysis (i.e., taking the Type II risk into account), for disclosing artificial model check results, and for the deletion of Rasch model misfitting examinees. These suggestions are empirically founded and may serve in the establishment of certain rough state-of-the-art standards. However, a degree of statistical elaboration is needed; and forthcoming test authors will still suffer from the fact that no standard software exists that offers all of the given approaches as a package. 相似文献

16.

A Cautionary Note on Using G2(dif) to Assess Relative Model Fit in Categorical Data Analysis

《Multivariate behavioral research》2013,48(1):55-64

The likelihood ratio test statistic G²(dif) is widely used for comparing the fit of nested models in categorical data analysis. In large samples, this statistic is distributed as a chi-square with degrees of freedom equal to the difference in degrees of freedom between the tested models, but only if the least restrictive model is correctly specified. Yet, this statistic is often used in applications without assessing the adequacy of the least restrictive model. This may result in incorrect substantive conclusions as the above large sample reference distribution for G²(dif) is no longer appropriate. Rather, its large sample distribution will depend on the degree of model misspecification of the least restrictive model. To illustrate this, a simulation study is performed where this statistic is used to compare nested item response theory models under various degrees of misspecification of the least restrictive model. G²(dif) was found to be robust only under small model misspecification of the least restrictive model. Consequently, we argue that some indication of the absolute goodness of fit of the least restrictive model is needed before employing G²(dif) to assess relative model fit. 相似文献

17.

使用题组反应模型缓解局部题目依赖性对多阶段测验的危害

詹沛达高椿雷边玉芳罗照盛《心理科学》2017,40(1):216-223

尽管多阶段测验(MST)在保持自适应测验优点的同时允许测验编制者按照一定的约束条件去建构每一个模块和题板,但建构测验时若因忽视某些潜在的因素而导致题目之间出现局部题目依赖性(LID)时,也会对MST测验结果带来一定的危害。为探究"LID对MST的危害"这一问题,本研究首先介绍了MST和LID等相关概念;然后通过模拟研究比较探讨该问题,结果表明LID的存在会影响被试能力估计的精度但仍为估计偏差较小,且该危害不限于某一特定的路由规则;之后为消除该危害,使用了题组反应模型作为MST施测过程中的分析模型,结果表明尽管该方法能够消除部分危害但效果有限。这一方面表明LID对MST中被试能力估计精度所带来的危害确实值得关注,另一方面也表明在今后关于如何消除MST中由LID造成危害的方法仍值得进一步探究的。相似文献

18.

Loglinear Rasch model tests 总被引：1，自引：0，他引：1

Hendrikus Kelderman 《Psychometrika》1984,49(2):223-245

Existing statistical tests for the fit of the Rasch model have been criticized, because they are only sensitive to specific violations of its assumptions. Contingency table methods using loglinear models have been used to test various psychometric models. In this paper, the assumptions of the Rasch model are discussed and the Rasch model is reformulated as a quasi-independence model. The model is a quasi-loglinear model for the incomplete subgroup × score × item 1 × item 2 × ... × itemk contingency table. Using ordinary contingency table methods the Rasch model can be tested generally or against less restrictive quasi-loglinear models to investigate specific violations of its assumptions. 相似文献

19.

Limits on Log Odds Ratios for Unidimensional Item Response Theory Models

Shelby J. Haberman Paul W. Holland Sandip Sinharay 《Psychometrika》2007,72(4):551-561

Bounds are established for log odds ratios (log cross-product ratios) involving pairs of items for item response models. First, expressions for bounds on log odds ratios are provided for one-dimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model. Results are also illustrated through an example from a study of model-checking procedures. The bounds obtained can provide an elementary basis for assessment of goodness of fit of these models. Any opinions expressed in this publication are those of the authors and not necessarily those of the Educational Testing Service. The authors thank Dan Eignor, Matthias von Davier, Lydia Gladkova, Brian Junker, and the three anonymous reviewers for their invaluable advice. The authors gratefully acknowledge the help of Kim Fryer with proofreading. 相似文献

20.

Testing the assumptions and interpreting the results of the Rasch model using log-linear procedures in SPSS

Elisabeth TenVergert Michael Gillespie Johannes Kingma 《Behavior research methods》1993,25(3):350-359

This paper shows how to use the log-linear subroutine of SPSS to fit the Rasch model. It also shows how to fit less restrictive models obtained by relaxing specific assumptions of the Rasch model. Conditional maximum likelihood estimation was achieved by including dummy variables for the total scores as covariates in the models. This approach greatly simplifies the specification of the Rasch models. We illustrate these procedures in an analysis of four items selected from the Reiss Premarital Sexual Permissiveness Scale. We found that a modified version of the Rasch model with item dependencies fits the data significantly better than the simple Rasch model. We also found that the item difficulties are the same for men and women, but that the item dependencies are significantly greater for men. Apart from any substantive issues these results raise, the value of this exercise lies in its demonstration of how researchers can use the procedures of popular, accessible software packages to study an increasingly important set of measurement models. 相似文献