首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In applications of item response theory, assessment of model fit is a critical issue. Recently, limited‐information goodness‐of‐fit testing has received increased attention in the psychometrics literature. In contrast to full‐information test statistics such as Pearson’s X2 or the likelihood ratio G2, these limited‐information tests utilize lower‐order marginal tables rather than the full contingency table. A notable example is Maydeu‐Olivares and colleagues’M2 family of statistics based on univariate and bivariate margins. When the contingency table is sparse, tests based on M2 retain better Type I error rate control than the full‐information tests and can be more powerful. While in principle the M2 statistic can be extended to test hierarchical multidimensional item factor models (e.g., bifactor and testlet models), the computation is non‐trivial. To obtain M2, a researcher often has to obtain (many thousands of) marginal probabilities, derivatives, and weights. Each of these must be approximated with high‐dimensional numerical integration. We propose a dimension reduction method that can take advantage of the hierarchical factor structure so that the integrals can be approximated far more efficiently. We also propose a new test statistic that can be substantially better calibrated and more powerful than the original M2 statistic when the test is long and the items are polytomous. We use simulations to demonstrate the performance of our new methods and illustrate their effectiveness with applications to real data.  相似文献   

2.
In sparse tables for categorical data well‐known goodness‐of‐fit statistics are not chi‐square distributed. A consequence is that model selection becomes a problem. It has been suggested that a way out of this problem is the use of the parametric bootstrap. In this paper, the parametric bootstrap goodness‐of‐fit test is studied by means of an extensive simulation study; the Type I error rates and power of this test are studied under several conditions of sparseness. In the presence of sparseness, models were used that were likely to violate the regularity conditions. Besides bootstrapping the goodness‐of‐fit usually used (full information statistics), corrected versions of these statistics and a limited information statistic are bootstrapped. These bootstrap tests were also compared to an asymptotic test using limited information. Results indicate that bootstrapping the usual statistics fails because these tests are too liberal, and that bootstrapping or asymptotically testing the limited information statistic works better with respect to Type I error and outperforms the other statistics by far in terms of statistical power. The properties of all tests are illustrated using categorical Markov models.  相似文献   

3.
Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full‐information test statistics such as Pearson's X2 and the likelihood ratio statistic G2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited‐information fit statistics such as Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q‐matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M2 was largely insensitive to misspecifications in the distribution of higher‐order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M2, we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic X LD 2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The X LD 2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M2 and X LD 2 statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144).  相似文献   

4.
Multinomial models are increasingly being used in psychology, and this use always requires estimating model parameters and testing goodness of fit with a composite null hypothesis. Goodness of fit is customarily tested with recourse to the asymptotic approximation to the distribution of the statistics. An assessment of the quality of this approximation requires a comparison with the exact distribution, but how to compute this exact distribution when parameters are estimated from the data appears never to have been defined precisely. The main goal of this paper is to compare two different approaches to defining this exact distribution. One of the approaches uses the marginal distribution and is, therefore, independent of the data; the other approach uses the conditional distribution of the statistics given the estimated parameters and, therefore, is data—dependent. We carried out a thorough study involving various parameter estimation methods and goodness‐of‐fit statistics, all of them members of the general class of power‐divergence measures. Included in the study were multinomial models with three to five cells and up to three parameters. Our results indicate that the asymptotic distribution is rarely a good approximation to the exact marginal distribution of the statistics, whereas it is a good approximation to the exact conditional distribution only when the vector of expected frequencies is interior to the sample space of the multinomial distribution.  相似文献   

5.
6.
The nontruncated marginal of a truncated bivariate normal distribution   总被引:7,自引:0,他引:7  
Inference is considered for the marginal distribution ofX, when (X, Y) has a truncated bivariate normal distribution. TheY variable is truncated, but only theX values are observed. The relationship of this distribution to Azzalini's skew-normal distribution is obtained. Method of moments and maximum likelihood estimation are compared for the three-parameter Azzalini distribution. Samples that are uniformative about the skewness of this distribution may occur, even for largen. Profile likelihood methods are employed to describe the uncertainty involved in parameter estimation. A sample of 87 Otis test scores is shown to be well-described by this model.  相似文献   

7.
Maximum likelihood estimation in confirmatory factor analysis requires large sample sizes, normally distributed item responses, and reliable indicators of each latent construct, but these ideals are rarely met. We examine alternative strategies for dealing with non‐normal data, particularly when the sample size is small. In two simulation studies, we systematically varied: the degree of non‐normality; the sample size from 50 to 1000; the way of indicator formation, comparing items versus parcels; the parcelling strategy, evaluating uniformly positively skews and kurtosis parcels versus those with counterbalancing skews and kurtosis; and the estimation procedure, contrasting maximum likelihood and asymptotically distribution‐free methods. We evaluated the convergence behaviour of solutions, as well as the systematic bias and variability of parameter estimates, and goodness of fit.  相似文献   

8.
Testing the fit of finite mixture models is a difficult task, since asymptotic results on the distribution of likelihood ratio statistics do not hold; for this reason, alternative statistics are needed. This paper applies the π* goodness of fit statistic to finite mixture item response models. The π* statistic assumes that the population is composed of two subpopulations – those that follow a parametric model and a residual group outside the model; π* is defined as the proportion of population in the residual group. The population was divided into two or more groups, or classes. Several groups followed an item response model and there was also a residual group. The paper presents maximum likelihood algorithms for estimating item parameters, the probabilities of the groups and π*. The paper also includes a simulation study on goodness of recovery for the two‐ and three‐parameter logistic models and an example with real data from a multiple choice test.  相似文献   

9.
This paper presents the asymptotic expansions of the distributions of the two‐sample t‐statistic and the Welch statistic, for testing the equality of the means of two independent populations under non‐normality. Unlike other approaches, we obtain the null distributions in terms of the distribution and density functions of the standard normal variable up to n?1, where n is the pooled sample size. Based on these expansions, monotone transformations are employed to remove the higher‐order cumulant effect. We show that the new statistics can improve the precision of statistical inference to the level of o (n?1). Numerical studies are carried out to demonstrate the performance of the improved statistics. Some general rules for practitioners are also recommended.  相似文献   

10.
A family of Root Mean Square Error of Approximation (RMSEA) statistics is proposed for assessing the goodness of approximation in discrete multivariate analysis with applications to item response theory (IRT) models. The family includes RMSEAs to assess the approximation up to any level of association of the discrete variables. Two members of this family are RMSEA2, which uses up to bivariate moments, and the full information RMSEAn. The RMSEA2 is estimated using the M2 statistic of Maydeu-Olivares and Joe (2005, 2006), whereas for maximum likelihood estimation, RMSEAn is estimated using Pearson's X2 statistic. Using IRT models, we provide cutoff criteria of adequate, good, and excellent fit using the RMSEA2. When the data are ordinal, we find a strong linear relationship between the RMSEA2 and the Standardized Root Mean Squared Residual goodness-of-fit index. We are unable to offer cutoff criteria for the RMSEAn as its population values decrease as the number of variables and categories increase.  相似文献   

11.
The likelihood ratio test statistic G2(dif) is widely used for comparing the fit of nested models in categorical data analysis. In large samples, this statistic is distributed as a chi-square with degrees of freedom equal to the difference in degrees of freedom between the tested models, but only if the least restrictive model is correctly specified. Yet, this statistic is often used in applications without assessing the adequacy of the least restrictive model. This may result in incorrect substantive conclusions as the above large sample reference distribution for G2(dif) is no longer appropriate. Rather, its large sample distribution will depend on the degree of model misspecification of the least restrictive model. To illustrate this, a simulation study is performed where this statistic is used to compare nested item response theory models under various degrees of misspecification of the least restrictive model. G2(dif) was found to be robust only under small model misspecification of the least restrictive model. Consequently, we argue that some indication of the absolute goodness of fit of the least restrictive model is needed before employing G2(dif) to assess relative model fit.  相似文献   

12.
The many null distributions of person fit indices   总被引:1,自引:0,他引:1  
This paper deals with the situation of an investigator who has collected the scores ofn persons to a set ofk dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fittingitems may be removed, this paper studies the alternative model in which a small minority ofpersons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting ofk symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called ability parameter, although our results are equally valid for Rasch scales measuring other attributes.As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.The authors express their gratitude to the reviewers and to many colleagues for comments on an earlier version.  相似文献   

13.
When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X2, (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M2 statistic applied to bivariate subtables. The unadjusted Pearson's X2 with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X2 is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions.  相似文献   

14.
Multivariate count data are commonly analysed by using Poisson distributions with varying intensity parameters, resulting in a random‐effects model. In the analysis of a data set on the frequency of different emotion experiences we find that a Poisson model with a single random effect does not yield an adequate fit. An alternative model that requires as many random effects as emotion categories requires high‐dimensional integration and the estimation of a large number of parameters. As a solution to these computational problems, we propose a factor‐analytic Poisson model and show that a two‐dimensional factor model fits the reported data very well. Moreover, it yields a substantively satisfactory solution: one factor describing the degree of pleasantness and unpleasantness of emotions and the other factor describing the activation levels of the emotions. We discuss the incorporation of covariates to facilitate rigorous tests of the random‐effects structure. Marginal maximum likelihood methods lead to straight‐forward estimation of the model, for which goodness‐of‐fit tests are also presented.  相似文献   

15.
Several authors have studied or used the following estimation strategy for meta‐analysing correlations: obtain a point estimate or confidence interval for the mean Fisher z correlation, and transform this estimate to the Pearson r metric. Using the relationship between Fisher z and Pearson r random variables, I demonstrate the potential discrepancy induced by directly z‐to‐r transforming a mean correlation parameter. Point and interval estimators based on an alternative integral z‐to‐r transformation are proposed. Analytic expressions for the expectation and variance of certain meta‐analytic point estimators are also provided, as are selected moments of correlation parameters; numerical examples are included. In an application of these analytic results, the proposed point estimator outperformed its usual direct z‐to‐r counterpart and compared favourably with an estimator based on Pearson r correlations. Practical implications, extensions of the proposed estimators, and uses for the analytic results are discussed.  相似文献   

16.
In the past two decades, statistical modelling with sparsity has become an active research topic in the fields of statistics and machine learning. Recently, Huang, Chen and Weng (2017, Psychometrika, 82, 329) and Jacobucci, Grimm, and McArdle (2016, Structural Equation Modeling: A Multidisciplinary Journal, 23, 555) both proposed sparse estimation methods for structural equation modelling (SEM). These methods, however, are restricted to performing single-group analysis. The aim of the present work is to establish a penalized likelihood (PL) method for multi-group SEM. Our proposed method decomposes each group model parameter into a common reference component and a group-specific increment component. By penalizing the increment components, the heterogeneity of parameter values across the population can be explored since the null group-specific effects are expected to diminish. We developed an expectation-conditional maximization algorithm to optimize the PL criteria. A numerical experiment and a real data example are presented to demonstrate the potential utility of the proposed method.  相似文献   

17.
In the present paper, a general class of heteroscedastic one‐factor models is considered. In these models, the residual variances of the observed scores are explicitly modelled as parametric functions of the one‐dimensional factor score. A marginal maximum likelihood procedure for parameter estimation is proposed under both the assumption of multivariate normality of the observed scores conditional on the single common factor score and the assumption of normality of the common factor score. A likelihood ratio test is derived, which can be used to test the usual homoscedastic one‐factor model against one of the proposed heteroscedastic models. Simulation studies are carried out to investigate the robustness and the power of this likelihood ratio test. Results show that the asymptotic properties of the test statistic hold under both small test length conditions and small sample size conditions. Results also show under what conditions the power to detect different heteroscedasticity parameter values is either small, medium, or large. Finally, for illustrative purposes, the marginal maximum likelihood estimation procedure and the likelihood ratio test are applied to real data.  相似文献   

18.
Parameters of the two‐parameter logistic model are generally estimated via the expectation–maximization (EM) algorithm by the maximum‐likelihood (ML) method. In so doing, it is beneficial to estimate the common prior distribution of the latent ability from data. Full non‐parametric ML (FNPML) estimation allows estimation of the latent distribution with maximum flexibility, as the distribution is modelled non‐parametrically on a number of (freely moving) support points. It is generally assumed that EM estimation of the two‐parameter logistic model is not influenced by initial values, but studies on this topic are unavailable. Therefore, the present study investigates the sensitivity to initial values in FNPML estimation. In contrast to the common assumption, initial values are found to have notable influence: for a standard convergence criterion, item discrimination and difficulty parameter estimates as well as item characteristic curve (ICC) recovery were influenced by initial values. For more stringent criteria, item parameter estimates were mainly influenced by the initial latent distribution, whilst ICC recovery was unaffected. The reason for this might be a flat surface of the log‐likelihood function, which would necessitate setting a sufficiently tight convergence criterion for accurate recovery of item parameters.  相似文献   

19.
Despite the compelling nature of goodness of fit, empirical support has lagged for this construct. The present study examined an interactional approach to measuring goodness of fit and prospectively explored associations with mother–child relationship quality, child behaviour problems and parenting stress across the preschool period. In addition, as goodness of fit might be particularly important for children at developmental risk, the presence of early developmental delay was considered as a moderator of goodness‐of‐fit processes. Children with (n = 110) and without (n = 137) developmental delays and their mothers were coded while interacting in the lab at child age 36 months and during naturalistic home observations at child ages 36 and 48 months. Mothers also completed questionnaires at child age 60 months. Results highlight the effects of child developmental risk as a moderator of mother–child goodness‐of‐fit processes across the preschool period. There was also evidence that the goodness of fit between maternal scaffolding and child activity level at 36 months influenced both mother and child functioning at 60 months. Findings call for more precise models and expanded developmental perspectives to fully capture the transactional and dynamic nature of goodness of fit. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
The supplemented EM (SEM) algorithm is applied to address two goodness‐of‐fit testing problems in psychometrics. The first problem involves computing the information matrix for item parameters in item response theory models. This matrix is important for limited‐information goodness‐of‐fit testing and it is also used to compute standard errors for the item parameter estimates. For the second problem, it is shown that the SEM algorithm provides a convenient computational procedure that leads to an asymptotically chi‐squared goodness‐of‐fit statistic for the ‘two‐stage EM’ procedure of fitting covariance structure models in the presence of missing data. Both simulated and real data are used to illustrate the proposed procedures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号