首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum likelihood are used much more frequently. These methods assume that data are missing at random. One very common approach to increase the likelihood that missing at random is achieved consists of including many covariates as so-called auxiliary variables. These variables are either included based on data considerations or in an inclusive fashion; that is, taking all available auxiliary variables. In this article, we point out that there are some instances in which auxiliary variables exhibit the surprising property of increasing bias in missing data problems. In a series of focused simulation studies, we highlight some situations in which this type of biasing behavior can occur. We briefly discuss possible ways how one can avoid selecting bias-inducing covariates as auxiliary variables.  相似文献   

2.
Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including listwise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method. An R package bmem is developed to implement the four methods for mediation analysis with missing data in the structural equation modeling framework, and two real examples are used to illustrate the application of the four methods. The four methods are evaluated and compared under MCAR, MAR, and MNAR missing data mechanisms through simulation studies. Both MI and TS-ML perform well for MCAR and MAR data regardless of the inclusion of auxiliary variables and for AV-MNAR data with auxiliary variables. Although listwise deletion and pairwise deletion have low power and large parameter estimation bias in many studied conditions, they may provide useful information for exploring missing mechanisms.  相似文献   

3.
In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix independently and concurrently, thus destroying the entire correlational structure of the data. This strategy is considered appropriate for assessing the significance of the PCA solution as a whole, but is not suitable for assessing the significance of the contribution of single variables. Alternatively, we propose a strategy involving permutation of one variable at a time, while keeping the other variables fixed. We compare the two approaches in a simulation study, considering proportions of Type I and Type II error. We use two corrections for multiple testing: the Bonferroni correction and controlling the False Discovery Rate (FDR). To assess the significance of the variance accounted for by the variables, permuting one variable at a time, combined with FDR correction, yields the most favorable results. This optimal strategy is applied to an empirical data set, and results are compared with bootstrap confidence intervals.  相似文献   

4.
Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric, implying linear relationships between the variables at hand; alternatively, non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components (PCs) is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is a smoothed intermediate between standard PCA on category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects and offers both better interpretability of the non-linear transformation of the category labels and better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized optimal scaling to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided.  相似文献   

5.
Many studies yield multivariate multiblock data, that is, multiple data blocks that all involve the same set of variables (e.g., the scores of different groups of subjects on the same set of variables). The question then rises whether the same processes underlie the different data blocks. To explore the structure of such multivariate multiblock data, component analysis can be very useful. Specifically, 2 approaches are often applied: principal component analysis (PCA) on each data block separately and different variants of simultaneous component analysis (SCA) on all data blocks simultaneously. The PCA approach yields a different loading matrix for each data block and is thus not useful for discovering structural similarities. The SCA approach may fail to yield insight into structural differences, since the obtained loading matrix is identical for all data blocks. We introduce a new generic modeling strategy, called clusterwise SCA, that comprises the separate PCA approach and SCA as special cases. The key idea behind clusterwise SCA is that the data blocks form a few clusters, where data blocks that belong to the same cluster are modeled with SCA and thus have the same structure, and different clusters have different underlying structures. In this article, we use the SCA variant that imposes equal average cross-products constraints (ECP). An algorithm for fitting clusterwise SCA-ECP solutions is proposed and evaluated in a simulation study. Finally, the usefulness of clusterwise SCA is illustrated by empirical examples from eating disorder research and social psychology.  相似文献   

6.
Exploratory factor analysis is a popular statistical technique used in communication research. Although exploratory factor analysis (EFA) and principal components analysis (PCA) are different techniques, PCA is often employed incorrectly to reveal latent constructs (i.e., factors) of observed variables, which is the purpose of EFA. PCA is more appropriate for reducing measured variables into a smaller set of variables (i.e., components) by keeping as much variance as possible out of the total variance in the measured variables. Furthermore, the popular use of varimax rotation raises some concerns about the relationships among the factors that researchers claim to discover. This paper discusses the distinct purposes of PCA and EFA, using two data sets as examples to highlight the differences in results between these procedures, and also reviews the use of each technique in three major communication journals: Communication Monographs, Human Communication Research, and Communication Research.  相似文献   

7.
The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the components. In this paper, we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors. In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to demonstrate the importance of variable selection in the context of PCA.  相似文献   

8.
The authors provide a didactic treatment of nonlinear (categorical) principal components analysis (PCA). This method is the nonlinear equivalent of standard PCA and reduces the observed variables to a number of uncorrelated principal components. The most important advantages of nonlinear over linear PCA are that it incorporates nominal and ordinal variables and that it can handle and discover nonlinear relationships between variables. Also, nonlinear PCA can deal with variables at their appropriate measurement level; for example, it can treat Likert-type scales ordinally instead of numerically. Every observed value of a variable can be referred to as a category. While performing PCA, nonlinear PCA converts every category to a numeric value, in accordance with the variable's analysis level, using optimal quantification. The authors discuss how optimal quantification is carried out, what analysis levels are, which decisions have to be made when applying nonlinear PCA, and how the results can be interpreted. The strengths and limitations of the method are discussed. An example applying nonlinear PCA to empirical data using the program CATPCA (J. J. Meulman, W. J. Heiser, & SPSS, 2004) is provided.  相似文献   

9.
Multiple‐set canonical correlation analysis and principal components analysis are popular data reduction techniques in various fields, including psychology. Both techniques aim to extract a series of weighted composites or components of observed variables for the purpose of data reduction. However, their objectives of performing data reduction are different. Multiple‐set canonical correlation analysis focuses on describing the association among several sets of variables through data reduction, whereas principal components analysis concentrates on explaining the maximum variance of a single set of variables. In this paper, we provide a unified framework that combines these seemingly incompatible techniques. The proposed approach embraces the two techniques as special cases. More importantly, it permits a compromise between the techniques in yielding solutions. For instance, we may obtain components in such a way that they maximize the association among multiple data sets, while also accounting for the variance of each data set. We develop a single optimization function for parameter estimation, which is a weighted sum of two criteria for multiple‐set canonical correlation analysis and principal components analysis. We minimize this function analytically. We conduct simulation studies to investigate the performance of the proposed approach based on synthetic data. We also apply the approach for the analysis of functional neuroimaging data to illustrate its empirical usefulness.  相似文献   

10.
In the diagnostic evaluation of educational systems, self-reports are commonly used to collect data, both cognitive and orectic. For various reasons, in these self-reports, some of the students' data are frequently missing. The main goal of this research is to compare the performance of different imputation methods for missing data in the context of the evaluation of educational systems. On an empirical database of 5,000 subjects, 72 conditions were simulated: three levels of missing data, three types of loss mechanisms, and eight methods of imputation. The levels of missing data were 5%, 10%, and 20%. The loss mechanisms were set at: Missing completely at random, moderately conditioned, and strongly conditioned. The eight imputation methods used were: listwise deletion, replacement by the mean of the scale, by the item mean, the subject mean, the corrected subject mean, multiple regression, and Expectation-Maximization (EM) algorithm, with and without auxiliary variables. The results indicate that the recovery of the data is more accurate when using an appropriate combination of different methods of recovering lost data. When a case is incomplete, the mean of the subject works very well, whereas for completely lost data, multiple imputation with the EM algorithm is recommended. The use of this combination is especially recommended when data loss is greater and its loss mechanism is more conditioned. Lastly, the results are discussed, and some future lines of research are analyzed.  相似文献   

11.
追踪研究中缺失数据十分常见。本文通过Monte Carlo模拟研究,考察基于不同前提假设的Diggle-Kenward选择模型和ML方法对增长参数估计精度的差异,并考虑样本量、缺失比例、目标变量分布形态以及不同缺失机制的影响。结果表明:(1)缺失机制对基于MAR的ML方法有较大的影响,在MNAR缺失机制下,基于MAR的ML方法对LGM模型中截距均值和斜率均值的估计不具有稳健性。(2)DiggleKenward选择模型更容易受到目标变量分布偏态程度的影响,样本量与偏态程度存在交互作用,样本量较大时,偏态程度的影响会减弱。而ML方法仅在MNAR机制下轻微受到偏态程度的影响。  相似文献   

12.
This article provides the theory and application of the 2-stage maximum likelihood (ML) procedure for structural equation modeling (SEM) with missing data. The validity of this procedure does not require the assumption of a normally distributed population. When the population is normally distributed and all missing data are missing at random (MAR), the direct ML procedure is nearly optimal for SEM with missing data. When missing data mechanisms are unknown, including auxiliary variables in the analysis will make the missing data mechanism more likely to be MAR. It is much easier to include auxiliary variables in the 2-stage ML than in the direct ML. Based on most recent developments for missing data with an unknown population distribution, the article first provides the least technical material on why the normal distribution-based ML generates consistent parameter estimates when the missing data mechanism is MAR. The article also provides sufficient conditions for the 2-stage ML to be a valid statistical procedure in the general case. For the application of the 2-stage ML, an SAS IML program is given to perform the first-stage analysis and EQS codes are provided to perform the second-stage analysis. An example with open- and closed-book examination data is used to illustrate the application of the provided programs. One aim is for quantitative graduate students/applied psychometricians to understand the technical details for missing data analysis. Another aim is for applied researchers to use the method properly.  相似文献   

13.
Parallel analysis (PA) is an often-recommended approach for assessment of the dimensionality of a variable set. PA is known in different variants, which may yield different dimensionality indications. In this article, the authors considered the most appropriate PA procedure to assess the number of common factors underlying ordered polytomously scored variables. They proposed minimum rank factor analysis (MRFA) as an extraction method, rather than the currently applied principal component analysis (PCA) and principal axes factoring. A simulation study, based on data with major and minor factors, showed that all procedures consistently point at the number of major common factors. A polychoric-based PA slightly outperformed a Pearson-based PA, but convergence problems may hamper its empirical application. In empirical practice, PA-MRFA with a 95% threshold based on polychoric correlations or, in case of nonconvergence, Pearson correlations with mean thresholds appear to be a good choice for identification of the number of common factors. PA-MRFA is a common-factor-based method and performed best in the simulation experiment. PA based on PCA with a 95% threshold is second best, as this method showed good performances in the empirically relevant conditions of the simulation experiment.  相似文献   

14.
陈楠  刘红云 《心理科学》2015,(2):446-451
对含有非随机缺失数据的潜变量增长模型,为了考察基于不同假设的缺失数据处理方法:极大似然(ML)方法与DiggleKenward选择模型的优劣,通过Monte Carlo模拟研究,比较两种方法对模型中增长参数估计精度及其标准误估计的差异,并考虑样本量、非随机缺失比例和随机缺失比例的影响。结果表明,符合前提假设的Diggle-Kenward选择模型的参数估计精度普遍高于ML方法;对于标准误估计值,ML方法存在一定程度的低估,得到的置信区间覆盖比率也明显低于Diggle-Kenward选择模型。  相似文献   

15.
Examinee‐selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non‐ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two‐dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non‐ignorable and to determine how to apply the new model to the data collected. Two follow‐up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non‐ignorable missing data were mistakenly treated as ignorable.  相似文献   

16.
JOB ANALYSIS: THE COMPOSITION OF SME SAMPLES   总被引:5,自引:1,他引:4  
It is common for job analysts to solicit information from incumbents and supervisors (Subject Matter Experts or SMEs) when conducting a job analysis. These SMEs are asked to provide ratings on salient dimensions (e.g., frequency and importance of tasks that comprise the job). In constructing samples of SMEs for this purpose, it is reasonable to consider any possible influences that might bias or systematically influence the task ratings. The present paper considers the possible influence of SME demographic characteristics on task ratings of frequency. The tasks comprising the job of patrol officer in a large city were rated for frequency by approximately 700 incumbents. These ratings were gathered in two different years, 1982 and 1984. The total data set was used to conduct a components analysis of the 444-item task inventory. The first eight principal components were considered the dependent variables and four demographic characteristics the independent variables in an analysis of the 1982 data set. Analysis of variance and follow-up tests indicated that incumbent experience had a substantial influence on task ratings. Educational level and race had minimal effects on ratings. The sex factor was confounded by the experience factor, making interpretation of the sex effect equivocal. Mechanisms that might account for the experience effect are discussed. It was concluded that incumbent experience is a salient issue in job analysis using SME groups.  相似文献   

17.
The main purpose of this article is to develop a Bayesian approach for structural equation models with ignorable missing continuous and polytomous data. Joint Bayesian estimates of thresholds, structural parameters and latent factor scores are obtained simultaneously. The idea of data augmentation is used to solve the computational difficulties involved. In the posterior analysis, in addition to the real missing data, latent variables and latent continuous measurements underlying the polytomous data are treated as hypothetical missing data. An algorithm that embeds the Metropolis-Hastings algorithm within the Gibbs sampler is implemented to produce the Bayesian estimates. A goodness-of-fit statistic for testing the posited model is presented. It is shown that the proposed approach is not sensitive to prior distributions and can handle situations with a large number of missing patterns whose underlying sample sizes may be small. Computational efficiency of the proposed procedure is illustrated by simulation studies and a real example.The work described in this paper was fully supported by a grant from the Research Grants Council of the HKSAR (Project No. CUHK 4088/99H). The authors are greatly indebted to the Editor and anonymous reviewers for valuable comments in improving the paper; and also to D. E. Morisky and J.A. Stein for the use of their AIDS data set.  相似文献   

18.
Structural equation models (SEMs) have become widely used to determine the interrelationships between latent and observed variables in social, psychological, and behavioural sciences. As heterogeneous data are very common in practical research in these fields, the analysis of mixture models has received a lot of attention in the literature. An important issue in the analysis of mixture SEMs is the presence of missing data, in particular of data missing with a non‐ignorable mechanism. However, only a limited amount of work has been done in analysing mixture SEMs with non‐ignorable missing data. The main objective of this paper is to develop a Bayesian approach for analysing mixture SEMs with an unknown number of components and non‐ignorable missing data. A simulation study shows that Bayesian estimates obtained by the proposed Markov chain Monte Carlo methods are accurate and the Bayes factor computed via a path sampling procedure is useful for identifying the correct number of components, selecting an appropriate missingness mechanism, and investigating various effects of latent variables in the mixture SEMs. A real data set on a study of job satisfaction is used to demonstrate the methodology.  相似文献   

19.
A general approach for analyzing categorical data when there are missing data is described and illustrated. The method is based on generalized linear models with composite links. The approach can be used (among other applications) to fill in contingency tables with supplementary margins, fit loglinear models when data are missing, fit latent class models (without or with missing data on observed variables), fit models with fused cells (including many models from genetics), and to fill in tables or fit models to data when variables are more finely categorized for some cases than others. Both Newton-like and EM methods are easy to implement for parameter estimation.The author thanks the editor, the reviewers, Laurie Hopp Rindskopf, and Clifford Clogg for comments and suggestions that substantially improved the paper.  相似文献   

20.
Abstract

Extended redundancy analysis (ERA) combines linear regression with dimension reduction to explore the directional relationships between multiple sets of predictors and outcome variables in a parsimonious manner. It aims to extract a component from each set of predictors in such a way that it accounts for the maximum variance of outcome variables. In this article, we extend ERA into the Bayesian framework, called Bayesian ERA (BERA). The advantages of BERA are threefold. First, BERA enables to make statistical inferences based on samples drawn from the joint posterior distribution of parameters obtained from a Markov chain Monte Carlo algorithm. As such, it does not necessitate any resampling method, which is on the other hand required for (frequentist’s) ordinary ERA to test the statistical significance of parameter estimates. Second, it formally incorporates relevant information obtained from previous research into analyses by specifying informative power prior distributions. Third, BERA handles missing data by implementing multiple imputation using a Markov Chain Monte Carlo algorithm, avoiding the potential bias of parameter estimates due to missing data. We assess the performance of BERA through simulation studies and apply BERA to real data regarding academic achievement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号