期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Methods for Mediation Analysis with Missing Data

Zhiyong Zhang Lijuan Wang 《Psychometrika》2013,78(1):154-184

Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including listwise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method. An R package bmem is developed to implement the four methods for mediation analysis with missing data in the structural equation modeling framework, and two real examples are used to illustrate the application of the four methods. The four methods are evaluated and compared under MCAR, MAR, and MNAR missing data mechanisms through simulation studies. Both MI and TS-ML perform well for MCAR and MAR data regardless of the inclusion of auxiliary variables and for AV-MNAR data with auxiliary variables. Although listwise deletion and pairwise deletion have low power and large parameter estimation bias in many studied conditions, they may provide useful information for exploring missing mechanisms. 相似文献

2.

Missing data: our view of the state of the art 总被引：5，自引：0，他引：5

Schafer JL Graham JW 《心理学方法》2002,7(2):147-177

Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art. 相似文献

3.

Evaluation of the Bayesian and Maximum Likelihood Approaches in Analyzing Structural Equation Models with Small Sample Sizes

《Multivariate behavioral research》2013,48(4):653-686

The main objective of this article is to investigate the empirical performances of the Bayesian approach in analyzing structural equation models with small sample sizes. The traditional maximum likelihood (ML) is also included for comparison. In the context of a confirmatory factor analysis model and a structural equation model, simulation studies are conducted with the different magnitudes of parameters and sample sizes n = da, where d = 2, 3, 4 and 5, and a is the number of unknown parameters. The performances are evaluated in terms of the goodness-of-fit statistics, and various measures on the accuracy of the estimates. The conclusion is: for data that are normally distributed, the Bayesian approach can be used with small sample sizes, whilst ML cannot. 相似文献

4.

LGM模型中缺失数据处理方法的比较:ML方法与Diggle-Kenward选择模型

张杉杉陈楠刘红云《心理学报》2017,49(5)

追踪研究中缺失数据十分常见。本文通过Monte Carlo模拟研究,考察基于不同前提假设的Diggle-Kenward选择模型和ML方法对增长参数估计精度的差异,并考虑样本量、缺失比例、目标变量分布形态以及不同缺失机制的影响。结果表明:(1)缺失机制对基于MAR的ML方法有较大的影响,在MNAR缺失机制下,基于MAR的ML方法对LGM模型中截距均值和斜率均值的估计不具有稳健性。(2)DiggleKenward选择模型更容易受到目标变量分布偏态程度的影响,样本量与偏态程度存在交互作用,样本量较大时,偏态程度的影响会减弱。而ML方法仅在MNAR机制下轻微受到偏态程度的影响。相似文献

5.

Estimating Incremental Validity Under Missing Data

Dustin A. Fife Jorge L. Mendoza Christopher M. Berry 《Multivariate behavioral research》2017,52(2):164-177

A common form of missing data is caused by selection on an observed variable (e.g., Z). If the selection variable was measured and is available, the data are regarded as missing at random (MAR). Selection biases correlation, reliability, and effect size estimates when these estimates are computed on listwise deleted (LD) data sets. On the other hand, maximum likelihood (ML) estimates are generally unbiased and outperform LD in most situations, at least when the data are MAR. The exception is when we estimate the partial correlation. In this situation, LD estimates are unbiased when the cause of missingness is partialled out. In other words, there is no advantage of ML estimates over LD estimates in this situation. We demonstrate that under a MAR condition, even ML estimates may become biased, depending on how partial correlations are computed. Finally, we conclude with recommendations about how future researchers might estimate partial correlations even when the cause of missingness is unknown and, perhaps, unknowable. 相似文献

6.

基于增长模型的非随机缺失数据处理:选择模型和极大似然方法

陈楠刘红云《心理科学》2015,(2):446-451

对含有非随机缺失数据的潜变量增长模型,为了考察基于不同假设的缺失数据处理方法:极大似然(ML)方法与DiggleKenward选择模型的优劣,通过Monte Carlo模拟研究,比较两种方法对模型中增长参数估计精度及其标准误估计的差异,并考虑样本量、非随机缺失比例和随机缺失比例的影响。结果表明,符合前提假设的Diggle-Kenward选择模型的参数估计精度普遍高于ML方法;对于标准误估计值,ML方法存在一定程度的低估,得到的置信区间覆盖比率也明显低于Diggle-Kenward选择模型。相似文献

7.

Missing not at random models for latent growth curve analyses

Enders CK 《心理学方法》2011,16(1):1-16

The past decade has seen a noticeable shift in missing data handling techniques that assume a missing at random (MAR) mechanism, where the propensity for missing data on an outcome is related to other analysis variables. Although MAR is often reasonable, there are situations where this assumption is unlikely to hold, leading to biased parameter estimates. One such example is a longitudinal study of substance use where participants with the highest frequency of use also have the highest likelihood of attrition, even after controlling for other correlates of missingness. There is a large body of literature on missing not at random (MNAR) analysis models for longitudinal data, particularly in the field of biostatistics. Because these methods allow for a relationship between the outcome variable and the propensity for missing data, they require a weaker assumption about the missing data mechanism. This article describes 2 classic MNAR modeling approaches for longitudinal data: the selection model and the pattern mixture model. To date, these models have been slow to migrate to the social sciences, in part because they required complicated custom computer programs. These models are now quite easy to estimate in popular structural equation modeling programs, particularly Mplus. The purpose of this article is to describe these MNAR modeling frameworks and to illustrate their application on a real data set. Despite their potential advantages, MNAR-based analyses are not without problems and also rely on untestable assumptions. This article offers practical advice for implementing and choosing among different longitudinal models. 相似文献

8.

A Class of Distribution-Free Models for Longitudinal Mediation Analysis

D. Gunzler W. Tang N. Lu P. Wu X. M. Tu 《Psychometrika》2014,79(4):543-568

Mediation analysis constitutes an important part of treatment study to identify the mechanisms by which an intervention achieves its effect. Structural equation model (SEM) is a popular framework for modeling such causal relationship. However, current methods impose various restrictions on the study designs and data distributions, limiting the utility of the information they provide in real study applications. In particular, in longitudinal studies missing data is commonly addressed under the assumption of missing at random (MAR), where current methods are unable to handle such missing data if parametric assumptions are violated. In this paper, we propose a new, robust approach to address the limitations of current SEM within the context of longitudinal mediation analysis by utilizing a class of functional response models (FRM). Being distribution-free, the FRM-based approach does not impose any parametric assumption on data distributions. In addition, by extending the inverse probability weighted (IPW) estimates to the current context, the FRM-based SEM provides valid inference for longitudinal mediation analysis under the two most popular missing data mechanisms; missing completely at random (MCAR) and missing at random (MAR). We illustrate the approach with both real and simulated data. 相似文献

9.

Abstract: Evaluation of Test Statistics for Robust Structural Equation Modeling With Nonnormal Missing Data

Xin Tong Zhiyong Zhang Ke-Hai Yuan 《Multivariate behavioral research》2013,48(6)

Traditional structural equation modeling (SEM) techniques have trouble dealing with incomplete and/or nonnormal data that are often encountered in practice. Yuan and Zhang (2011a) developed a two-stage procedure for SEM to handle nonnormal missing data and proposed four test statistics for overall model evaluation. Although these statistics have been shown to work well with complete data, their performance for incomplete data has not been investigated in the context of robust statistics.

Focusing on a linear growth curve model, a systematic simulation study is conducted to evaluate the accuracy of the parameter estimates and the performance of five test statistics including the naive statistic derived from normal distribution based maximum likelihood (ML), the Satorra-Bentler scaled chi-square statistic (RML), the mean- and variance-adjusted chi-square statistic (AML), Yuan-Bentler residual-based test statistic (CRADF), and Yuan-Bentler residual-based F statistic (RF). Data are generated and analyzed in R using the package rsem (Yuan & Zhang, 2011b).

Based on the simulation study, we can observe the following: (a) The traditional normal distribution-based method cannot yield accurate parameter estimates for nonnormal data, whereas the robust method obtains much more accurate model parameter estimates for nonnormal data and performs almost as well as the normal distribution based method for normal distributed data. (b) With the increase of sample size, or the decrease of missing rate or the number of outliers, the parameter estimates are less biased and the empirical distributions of test statistics are closer to their nominal distributions. (c) The ML test statistic does not work well for nonnormal or missing data. (d) For nonnormal complete data, CRADF and RF work relatively better than RML and AML. (e) For missing completely at random (MCAR) missing data, in almost all the cases, RML and AML work better than CRADF and RF. (f) For nonnormal missing at random (MAR) missing data, CRADF and RF work better than AML. (g) The performance of the robust method does not seem to be influenced by the symmetry of outliers. 相似文献

10.

An introduction to modern missing data analyses 总被引：2，自引：0，他引：2

Amanda N. Baraldi Craig K. Enders 《Journal of School Psychology》2010,48(1):5-37

相似文献

11.

A Cautious Note on Auxiliary Variables That Can Increase Bias in Missing Data Problems

Felix Thoemmes Norman Rose 《Multivariate behavioral research》2013,48(5):443-459

The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum likelihood are used much more frequently. These methods assume that data are missing at random. One very common approach to increase the likelihood that missing at random is achieved consists of including many covariates as so-called auxiliary variables. These variables are either included based on data considerations or in an inclusive fashion; that is, taking all available auxiliary variables. In this article, we point out that there are some instances in which auxiliary variables exhibit the surprising property of increasing bias in missing data problems. In a series of focused simulation studies, we highlight some situations in which this type of biasing behavior can occur. We briefly discuss possible ways how one can avoid selecting bias-inducing covariates as auxiliary variables. 相似文献

12.

Postmodeling Sensitivity Analysis to Detect the Effect of Missing Data Mechanisms

Mortaza Jamshidian Matthew Mata 《Multivariate behavioral research》2013,48(3):432-452

Incomplete or missing data is a common problem in almost all areas of empirical research. It is well known that simple and ad hoc methods such as complete case analysis or mean imputation can lead to biased and/or inefficient estimates. The method of maximum likelihood works well; however, when the missing data mechanism is not one of missing completely at random (MCAR) or missing at random (MAR), it too can result in incorrect inference. Statistical tests for MCAR have been proposed, but these are restricted to a certain class of problems. The idea of sensitivity analysis as a means to detect the missing data mechanism has been proposed in the statistics literature in conjunction with selection models where conjointly the data and missing data mechanism are modeled. Our approach is different here in that we do not model the missing data mechanism but use the data at hand to examine the sensitivity of a given model to the missing data mechanism. Our methodology is meant to raise a flag for researchers when the assumptions of MCAR (or MAR) do not hold. To our knowledge, no specific proposal for sensitivity analysis has been set forth in the area of structural equation models (SEM). This article gives a specific method for performing postmodeling sensitivity analysis using a statistical test and graphs. A simulation study is performed to assess the methodology in the context of structural equation models. This study shows success of the method, especially when the sample size is 300 or more and the percentage of missing data is 20% or more. The method is also used to study a set of real data measuring physical and social self-concepts in 463 Nigerian adolescents using a factor analysis model. 相似文献

13.

Using Principal Components as Auxiliary Variables in Missing Data Estimation

Waylon J. Howard Mijke Rhemtulla Todd D. Little 《Multivariate behavioral research》2013,48(3):285-299

To deal with missing data that arise due to participant nonresponse or attrition, methodologists have recommended an “inclusive” strategy where a large set of auxiliary variables are used to inform the missing data process. In practice, the set of possible auxiliary variables is often too large. We propose using principal components analysis (PCA) to reduce the number of possible auxiliary variables to a manageable number. A series of Monte Carlo simulations compared the performance of the inclusive strategy with eight auxiliary variables (inclusive approach) to the PCA strategy using just one principal component derived from the eight original variables (PCA approach). We examined the influence of four independent variables: magnitude of correlations, rate of missing data, missing data mechanism, and sample size on parameter bias, root mean squared error, and confidence interval coverage. Results indicate that the PCA approach results in unbiased parameter estimates and potentially more accuracy than the inclusive approach. We conclude that using the PCA strategy to reduce the number of auxiliary variables is an effective and practical way to reap the benefits of the inclusive strategy in the presence of many possible auxiliary variables. 相似文献

14.

Empirically Corrected Rescaled Statistics for SEM with Small N and Large p

Ke-Hai Yuan Miao Yang Ge Jiang 《Multivariate behavioral research》2017,52(6):673-698

Survey data often contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. With typical nonnormally distributed data in practice, a rescaled statistic T_rml proposed by Satorra and Bentler was recommended in the literature of SEM. However, T_rml has been shown to be problematic when the sample size N is small and/or the number of variables p is large. There does not exist a reliable test statistic for SEM with small N or large p, especially with nonnormally distributed data. Following the principle of Bartlett correction, this article develops empirical corrections to T_rml so that the mean of the empirically corrected statistics approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics control type I errors reasonably well even when N is smaller than 2p, where T_rml may reject the correct model 100% even for normally distributed data. The application of the empirically corrected statistics is illustrated via a real data example. 相似文献

15.

Asymptotic bias of normal-distribution-based maximum likelihood estimates of moderation effects with data missing at random

Qian Zhang Ke-Hai Yuan Lijuan Wang 《The British journal of mathematical and statistical psychology》2019,72(2):334-354

Moderation analysis is useful for addressing interesting research questions in social sciences and behavioural research. In practice, moderated multiple regression (MMR) models have been most widely used. However, missing data pose a challenge, mainly because the interaction term is a product of two or more variables and thus is a non-linear function of the involved variables. Normal-distribution-based maximum likelihood (NML) has been proposed and applied for estimating MMR models with incomplete data. When data are missing completely at random, moderation effect estimates are consistent. However, simulation results have found that when data in the predictor are missing at random (MAR), NML can yield inaccurate estimates of moderation effects when the moderation effects are non-null. Simulation studies are subject to the limitation of confounding systematic bias with sampling errors. Thus, the purpose of this paper is to analytically derive asymptotic bias of NML estimates of moderation effects with MAR data. Results show that when the moderation effect is zero, there is no asymptotic bias in moderation effect estimates with either normal or non-normal data. When the moderation effect is non-zero, however, asymptotic bias may exist and is determined by factors such as the moderation effect size, missing-data proportion, and type of missingness dependence. Our analytical results suggest that researchers should apply NML to MMR models with caution when missing data exist. Suggestions are given regarding moderation analysis with missing data. 相似文献

16.

Exploratory Factor Analysis With Small Samples and Missing Data

Daniel McNeish 《Journal of personality assessment》2017,99(6):637-652

Exploratory factor analysis (EFA) is an extremely popular method for determining the underlying factor structure for a set of variables. Due to its exploratory nature, EFA is notorious for being conducted with small sample sizes, and recent reviews of psychological research have reported that between 40% and 60% of applied studies have 200 or fewer observations. Recent methodological studies have addressed small size requirements for EFA models; however, these models have only considered complete data, which are the exception rather than the rule in psychology. Furthermore, the extant literature on missing data techniques with small samples is scant, and nearly all existing studies focus on topics that are not of primary interest to EFA models. Therefore, this article presents a simulation to assess the performance of various missing data techniques for EFA models with both small samples and missing data. Results show that deletion methods do not extract the proper number of factors and estimate the factor loadings with severe bias, even when data are missing completely at random. Predictive mean matching is the best method overall when considering extracting the correct number of factors and estimating factor loadings without bias, although 2-stage estimation was a close second. 相似文献

17.

Full Information Maximum Likelihood Estimation for Latent Variable Interactions With Incomplete Indicators

Heining Cham Evgeniya Reshetnyak Barry Rosenfeld William Breitbart 《Multivariate behavioral research》2017,52(1):12-30

Researchers have developed missing data handling techniques for estimating interaction effects in multiple regression. Extending to latent variable interactions, we investigated full information maximum likelihood (FIML) estimation to handle incompletely observed indicators for product indicator (PI) and latent moderated structural equations (LMS) methods. Drawing on the analytic work on missing data handling techniques in multiple regression with interaction effects, we compared the performance of FIML for PI and LMS analytically. We performed a simulation study to compare FIML for PI and LMS. We recommend using FIML for LMS when the indicators are missing completely at random (MCAR) or missing at random (MAR) and when they are normally distributed. FIML for LMS produces unbiased parameter estimates with small variances, correct Type I error rates, and high statistical power of interaction effects. We illustrated the use of these methods by analyzing the interaction effect between advanced cancer patients’ depression and change of inner peace well-being on future hopelessness levels. 相似文献

18.

Identifying Variables Responsible for Data not Missing at Random

Ke-Hai Yuan 《Psychometrika》2009,74(2):233-256

When data are not missing at random (NMAR), maximum likelihood (ML) procedure will not generate consistent parameter estimates unless the missing data mechanism is correctly modeled. Understanding NMAR mechanism in a data set would allow one to better use the ML methodology. A survey or questionnaire may contain many items; certain items may be responsible for NMAR values in other items. The paper develops statistical procedures to identify the responsible items. By comparing ML estimates (MLE), statistics are developed to test whether the MLEs are changed when excluding items. The items that cause a significant change of the MLEs are responsible for the NMAR mechanism. Normal distribution is used for obtaining the MLEs; a sandwich-type covariance matrix is used to account for distribution violations. The class of nonnormal distributions within which the procedure is valid is provided. Both saturated and structural models are considered. Effect sizes are also defined and studied. The results indicate that more missing data in a sample does not necessarily imply more significant test statistics due to smaller effect sizes. Knowing the true population means and covariances or the parameter values in structural equation models may not make things easier either. The research was supported by NSF grant DMS04-37167, the James McKeen Cattell Fund. 相似文献

19.

A modified procedure for parallel analysis of ordered categorical data

Liu OL Rijmen F 《Behavior research methods》2008,40(2):556-562

Parallel analysis has been well documented to be an effective and accurate method for determining the number of factors to retain in exploratory factor analysis. The O'Connor (2000) procedure for parallel analysis has many benefits and is widely applied, yet it has a few shortcomings in dealing with missing data and ordinal variables. To address these technical issues, we adapted and modified the O'Connor procedure to provide an alternative method that better approximates the ordinal data by factoring in the frequency distributions of the variables (e.g., the number of response categories and the frequency of each response category per variable). The theoretical and practical differences between the modified procedure and the O'Connor procedure are discussed. The SAS syntax for implementing this modified procedure is also provided. 相似文献

20.

Multiple Imputation of Item Scores in Test and Questionnaire Data,and Influence on Psychometric Results

Joost R. van Ginkel L. Andries van der Ark Klaas Sijtsma 《Multivariate behavioral research》2013,48(2):387-414

The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at random, or not missing at random. Cronbach's alpha, Loevinger's scalability coefficient H, and the item cluster solution from Mokken scale analysis of the complete data were compared with the corresponding results based on the data including imputed scores. The multiple-imputation methods, two-way with normally distributed errors, corrected item-mean substitution with normally distributed errors, and response function, produced discrepancies in Cronbach's coefficient alpha, Loevinger's coefficient H, and the cluster solution from Mokken scale analysis, that were smaller than the discrepancies in upper benchmark multivariate normal imputation. 相似文献