首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method for dealing with the problem of missing observations in multivariate data is developed and evaluated. The method uses a transformation of the principal components of the data to estimate missing entries. The properties of this method and four alternative methods are investigated by means of a Monte Carlo study of 42 computer-generated data matrices. The methods are compared with respect to their ability to predict correlation matrices as well as missing entries. The results indicate that whenever there exists modest intercorrelations among the variables (i.e., average off diagonal correlation above .2) the proposed method is at least as good as the best alternative (a regression method) while being considerably faster and simpler computationally. Models for determining the best alternative based upon easily calculated characteristics of the matrix are given. The generality of these models is demonstrated using the previously published results of Timm.  相似文献   

2.
Bentler PM  Yuan KH 《Psychometrika》2011,76(1):119-123
Indefinite symmetric matrices that are estimates of positive-definite population matrices occur in a variety of contexts such as correlation matrices computed from pairwise present missing data and multinormal based methods for discretized variables. This note describes a methodology for scaling selected off-diagonal rows and columns of such a matrix to achieve positive definiteness. As a contrast to recently developed ridge procedures, the proposed method does not need variables to contain measurement errors. When minimum trace factor analysis is used to implement the theory, only correlations that are associated with Heywood cases are shrunk.  相似文献   

3.
Existing test statistics for assessing whether incomplete data represent a missing completely at random sample from a single population are based on a normal likelihood rationale and effectively test for homogeneity of means and covariances across missing data patterns. The likelihood approach cannot be implemented adequately if a pattern of missing data contains very few subjects. A generalized least squares rationale is used to develop parallel tests that are expected to be more stable in small samples. Three factors were varied for a simulation: number of variables, percent missing completely at random, and sample size. One thousand data sets were simulated for each condition. The generalized least squares test of homogeneity of means performed close to an ideal Type I error rate for most of the conditions. The generalized least squares test of homogeneity of covariance matrices and a combined test performed quite well also.Preliminary results on this research were presented at the 1999 Western Psychological Association convention, Irvine, CA, and in the UCLA Statistics Preprint No. 265 (http://www.stat.ucla.edu). The assistance of Ke-Hai Yuan and several anonymous reviewers is gratefully acknowledged.  相似文献   

4.
5.
The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum likelihood are used much more frequently. These methods assume that data are missing at random. One very common approach to increase the likelihood that missing at random is achieved consists of including many covariates as so-called auxiliary variables. These variables are either included based on data considerations or in an inclusive fashion; that is, taking all available auxiliary variables. In this article, we point out that there are some instances in which auxiliary variables exhibit the surprising property of increasing bias in missing data problems. In a series of focused simulation studies, we highlight some situations in which this type of biasing behavior can occur. We briefly discuss possible ways how one can avoid selecting bias-inducing covariates as auxiliary variables.  相似文献   

6.
Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including listwise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method. An R package bmem is developed to implement the four methods for mediation analysis with missing data in the structural equation modeling framework, and two real examples are used to illustrate the application of the four methods. The four methods are evaluated and compared under MCAR, MAR, and MNAR missing data mechanisms through simulation studies. Both MI and TS-ML perform well for MCAR and MAR data regardless of the inclusion of auxiliary variables and for AV-MNAR data with auxiliary variables. Although listwise deletion and pairwise deletion have low power and large parameter estimation bias in many studied conditions, they may provide useful information for exploring missing mechanisms.  相似文献   

7.
Large sample properties of four methods of handling multivariate missing data are compared. The criterion for comparison is how well the loadings from a single factor model can be estimated. It is shown that efficiencies of the methods depend on the pattern or arrangement of missing data, and an evaluation study is used to generate predictive efficiency equations to guide one's choice of an estimating procedure. A simple regression-type estimator is introduced which shows high efficiency relative to the maximum likelihood method over a large range of patterns and covariance matrices.  相似文献   

8.
Test of homogeneity of covariances (or homoscedasticity) among several groups has many applications in statistical analysis. In the context of incomplete data analysis, tests of homoscedasticity among groups of cases with identical missing data patterns have been proposed to test whether data are missing completely at random (MCAR). These tests of MCAR require large sample sizes n and/or large group sample sizes n i , and they usually fail when applied to nonnormal data. Hawkins (Technometrics 23:105–110, 1981) proposed a test of multivariate normality and homoscedasticity that is an exact test for complete data when n i are small. This paper proposes a modification of this test for complete data to improve its performance, and extends its application to test of homoscedasticity and MCAR when data are multivariate normal and incomplete. Moreover, it is shown that the statistic used in the Hawkins test in conjunction with a nonparametric k-sample test can be used to obtain a nonparametric test of homoscedasticity that works well for both normal and nonnormal data. It is explained how a combination of the proposed normal-theory Hawkins test and the nonparametric test can be employed to test for homoscedasticity, MCAR, and multivariate normality. Simulation studies show that the newly proposed tests generally outperform their existing competitors in terms of Type I error rejection rates. Also, a power study of the proposed tests indicates good power. The proposed methods use appropriate missing data imputations to impute missing data. Methods of multiple imputation are described and one of the methods is employed to confirm the result of our single imputation methods. Examples are provided where multiple imputation enables one to identify a group or groups whose covariance matrices differ from the majority of other groups.  相似文献   

9.
A general approach for analyzing categorical data when there are missing data is described and illustrated. The method is based on generalized linear models with composite links. The approach can be used (among other applications) to fill in contingency tables with supplementary margins, fit loglinear models when data are missing, fit latent class models (without or with missing data on observed variables), fit models with fused cells (including many models from genetics), and to fill in tables or fit models to data when variables are more finely categorized for some cases than others. Both Newton-like and EM methods are easy to implement for parameter estimation.The author thanks the editor, the reviewers, Laurie Hopp Rindskopf, and Clifford Clogg for comments and suggestions that substantially improved the paper.  相似文献   

10.
In the diagnostic evaluation of educational systems, self-reports are commonly used to collect data, both cognitive and orectic. For various reasons, in these self-reports, some of the students' data are frequently missing. The main goal of this research is to compare the performance of different imputation methods for missing data in the context of the evaluation of educational systems. On an empirical database of 5,000 subjects, 72 conditions were simulated: three levels of missing data, three types of loss mechanisms, and eight methods of imputation. The levels of missing data were 5%, 10%, and 20%. The loss mechanisms were set at: Missing completely at random, moderately conditioned, and strongly conditioned. The eight imputation methods used were: listwise deletion, replacement by the mean of the scale, by the item mean, the subject mean, the corrected subject mean, multiple regression, and Expectation-Maximization (EM) algorithm, with and without auxiliary variables. The results indicate that the recovery of the data is more accurate when using an appropriate combination of different methods of recovering lost data. When a case is incomplete, the mean of the subject works very well, whereas for completely lost data, multiple imputation with the EM algorithm is recommended. The use of this combination is especially recommended when data loss is greater and its loss mechanism is more conditioned. Lastly, the results are discussed, and some future lines of research are analyzed.  相似文献   

11.
项目反应理论(IRT)是用于客观测量的现代教育与心理测量理论之一,广泛用于缺失数据十分常见的大尺度测验分析。IRT中两参数逻辑斯蒂克模型(2PLM)下仅有完全随机缺失机制下缺失反应和缺失能力处理的EM算法。本研究推导2PLM下缺失反应忽略的EM 算法,并提出随机缺失机制下缺失反应和缺失能力处理的EM算法和考虑能力估计和作答反应不确定性的多重借补法。研究显示:在各种缺失机制、缺失比例和测验设计下,缺失反应忽略的EM算法和多重借补法表现理想。  相似文献   

12.
The past decade has seen a noticeable shift in missing data handling techniques that assume a missing at random (MAR) mechanism, where the propensity for missing data on an outcome is related to other analysis variables. Although MAR is often reasonable, there are situations where this assumption is unlikely to hold, leading to biased parameter estimates. One such example is a longitudinal study of substance use where participants with the highest frequency of use also have the highest likelihood of attrition, even after controlling for other correlates of missingness. There is a large body of literature on missing not at random (MNAR) analysis models for longitudinal data, particularly in the field of biostatistics. Because these methods allow for a relationship between the outcome variable and the propensity for missing data, they require a weaker assumption about the missing data mechanism. This article describes 2 classic MNAR modeling approaches for longitudinal data: the selection model and the pattern mixture model. To date, these models have been slow to migrate to the social sciences, in part because they required complicated custom computer programs. These models are now quite easy to estimate in popular structural equation modeling programs, particularly Mplus. The purpose of this article is to describe these MNAR modeling frameworks and to illustrate their application on a real data set. Despite their potential advantages, MNAR-based analyses are not without problems and also rely on untestable assumptions. This article offers practical advice for implementing and choosing among different longitudinal models.  相似文献   

13.
During the last half century, hundreds of papers published in statistical journals have documented general conditions where reliance on least squares regression and Pearson's correlation can result in missing even strong associations between variables. Moreover, highly misleading conclusions can be made, even when the sample size is large. There are, in fact, several fundamental concerns related to non‐normality, outliers, heteroscedasticity, and curvature that can result in missing a strong association. Simultaneously, a vast array of new methods has been derived for effectively dealing with these concerns. The paper (i) reviews why least squares regression and classic inferential methods can fail, (ii) provides an overview of the many modern strategies for dealing with known problems, including some recent advances, and (iii) illustrates that modern robust methods can make a practical difference in our understanding of data. Included are some general recommendations regarding how modern methods might be used. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

14.
Structural equation models (SEMs) have become widely used to determine the interrelationships between latent and observed variables in social, psychological, and behavioural sciences. As heterogeneous data are very common in practical research in these fields, the analysis of mixture models has received a lot of attention in the literature. An important issue in the analysis of mixture SEMs is the presence of missing data, in particular of data missing with a non‐ignorable mechanism. However, only a limited amount of work has been done in analysing mixture SEMs with non‐ignorable missing data. The main objective of this paper is to develop a Bayesian approach for analysing mixture SEMs with an unknown number of components and non‐ignorable missing data. A simulation study shows that Bayesian estimates obtained by the proposed Markov chain Monte Carlo methods are accurate and the Bayes factor computed via a path sampling procedure is useful for identifying the correct number of components, selecting an appropriate missingness mechanism, and investigating various effects of latent variables in the mixture SEMs. A real data set on a study of job satisfaction is used to demonstrate the methodology.  相似文献   

15.
This article provides the theory and application of the 2-stage maximum likelihood (ML) procedure for structural equation modeling (SEM) with missing data. The validity of this procedure does not require the assumption of a normally distributed population. When the population is normally distributed and all missing data are missing at random (MAR), the direct ML procedure is nearly optimal for SEM with missing data. When missing data mechanisms are unknown, including auxiliary variables in the analysis will make the missing data mechanism more likely to be MAR. It is much easier to include auxiliary variables in the 2-stage ML than in the direct ML. Based on most recent developments for missing data with an unknown population distribution, the article first provides the least technical material on why the normal distribution-based ML generates consistent parameter estimates when the missing data mechanism is MAR. The article also provides sufficient conditions for the 2-stage ML to be a valid statistical procedure in the general case. For the application of the 2-stage ML, an SAS IML program is given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open- and closed-book examination data is used to illustrate the application of the provided programs. One aim is for quantitative graduate students/applied psychometricians to understand the technical details for missing data analysis. Another aim is for applied researchers to use the method properly.  相似文献   

16.
Model evaluation in covariance structure analysis is critical before the results can be trusted. Due to finite sample sizes and unknown distributions of real data, existing conclusions regarding a particular statistic may not be applicable in practice. The bootstrap procedure automatically takes care of the unknown distribution and, for a given sample size, also provides more accurate results than those based on standard asymptotics. But the procedure needs a matrix to play the role of the population covariance matrix. The closer the matrix is to the true population covariance matrix, the more valid the bootstrap inference is. The current paper proposes a class of covariance matrices by combining theory and data. Thus, a proper matrix from this class is closer to the true population covariance matrix than those constructed by any existing methods. Each of the covariance matrices is easy to generate and also satisfies several desired properties. An example with nine cognitive variables and a confirmatory factor model illustrates the details for creating population covariance matrices with different misspecifications. When evaluating the substantive model, bootstrap or simulation procedures based on these matrices will lead to more accurate conclusion than that based on artificial covariance matrices.  相似文献   

17.
Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.  相似文献   

18.
MISSING DATA: A CONCEPTUAL REVIEW FOR APPLIED PSYCHOLOGISTS   总被引:9,自引:0,他引:9  
There has been conspicuously little research concerning missing data problems in the applied psychology literature. Fortunately, other fields have begun to investigate this issue. These include survey research, marketing, statistics, economics, and biometrics. A review of this literature suggests several trends for applied psychologists. For example, listwise deletion of data is often the least accurate technique to deal with missing data. Other methods for estimating missing data scores may be more accurate and preserve more data for investigators to analyze. Further, the literature reveals that the amount of missing data and the reasons for deletion of data impact how investigators should handle the problem. Finally, there is a great need for more investigation of strategies for dealing with missing data, especially when data are missing in nonrandom or systematic patterns.  相似文献   

19.
Exploratory factor analysis (EFA) is an extremely popular method for determining the underlying factor structure for a set of variables. Due to its exploratory nature, EFA is notorious for being conducted with small sample sizes, and recent reviews of psychological research have reported that between 40% and 60% of applied studies have 200 or fewer observations. Recent methodological studies have addressed small size requirements for EFA models; however, these models have only considered complete data, which are the exception rather than the rule in psychology. Furthermore, the extant literature on missing data techniques with small samples is scant, and nearly all existing studies focus on topics that are not of primary interest to EFA models. Therefore, this article presents a simulation to assess the performance of various missing data techniques for EFA models with both small samples and missing data. Results show that deletion methods do not extract the proper number of factors and estimate the factor loadings with severe bias, even when data are missing completely at random. Predictive mean matching is the best method overall when considering extracting the correct number of factors and estimating factor loadings without bias, although 2-stage estimation was a close second.  相似文献   

20.
Multiple imputation under a two‐way model with error is a simple and effective method that has been used to handle missing item scores in unidimensional test and questionnaire data. Extensions of this method to multidimensional data are proposed. A simulation study is used to investigate whether these extensions produce biased estimates of important statistics in multidimensional data, and to compare them with lower benchmark listwise deletion, two‐way with error and multivariate normal imputation. The new methods produce smaller bias in several psychometrically interesting statistics than the existing methods of two‐way with error and multivariate normal imputation. One of these new methods clearly is preferable for handling missing item scores in multidimensional test data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号