首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric, implying linear relationships between the variables at hand; alternatively, non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components (PCs) is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is a smoothed intermediate between standard PCA on category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects and offers both better interpretability of the non-linear transformation of the category labels and better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized optimal scaling to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided.  相似文献   

2.
Principal components analysis (PCA) is used to explore the structure of data sets containing linearly related numeric variables. Alternatively, nonlinear PCA can handle possibly nonlinearly related numeric as well as nonnumeric variables. For linear PCA, the stability of its solution can be established under the assumption of multivariate normality. For nonlinear PCA, however, standard options for establishing stability are not provided. The authors use the nonparametric bootstrap procedure to assess the stability of nonlinear PCA results, applied to empirical data. They use confidence intervals for the variable transformations and confidence ellipses for the eigenvalues, the component loadings, and the person scores. They discuss the balanced version of the bootstrap, bias estimation, and Procrustes rotation. To provide a benchmark, the same bootstrap procedure is applied to linear PCA on the same data. On the basis of the results, the authors advise using at least 1,000 bootstrap samples, using Procrustes rotation on the bootstrap results, examining the bootstrap distributions along with the confidence regions, and merging categories with small marginal frequencies to reduce the variance of the bootstrap results.  相似文献   

3.
The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the components. In this paper, we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors. In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to demonstrate the importance of variable selection in the context of PCA.  相似文献   

4.
5.
6.
When analyzing data, researchers are often confronted with a model selection problem (e.g., determining the number of components/factors in principal components analysis [PCA]/factor analysis or identifying the most important predictors in a regression analysis). To tackle such a problem, researchers may apply some objective procedure, like parallel analysis in PCA/factor analysis or stepwise selection methods in regression analysis. A drawback of these procedures is that they can only be applied to the model selection problem at hand. An interesting alternative is the CHull model selection procedure, which was originally developed for multiway analysis (e.g., multimode partitioning). However, the key idea behind the CHull procedure—identifying a model that optimally balances model goodness of fit/misfit and model complexity—is quite generic. Therefore, the procedure may also be used when applying many other analysis techniques. The aim of this article is twofold. First, we demonstrate the wide applicability of the CHull method by showing how it can be used to solve various model selection problems in the context of PCA, reduced K-means, best-subset regression, and partial least squares regression. Moreover, a comparison of CHull with standard model selection methods for these problems is performed. Second, we present the CHULL software, which may be downloaded from http://ppw.kuleuven.be/okp/software/CHULL/, to assist the user in applying the CHull procedure.  相似文献   

7.
In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix independently and concurrently, thus destroying the entire correlational structure of the data. This strategy is considered appropriate for assessing the significance of the PCA solution as a whole, but is not suitable for assessing the significance of the contribution of single variables. Alternatively, we propose a strategy involving permutation of one variable at a time, while keeping the other variables fixed. We compare the two approaches in a simulation study, considering proportions of Type I and Type II error. We use two corrections for multiple testing: the Bonferroni correction and controlling the False Discovery Rate (FDR). To assess the significance of the variance accounted for by the variables, permuting one variable at a time, combined with FDR correction, yields the most favorable results. This optimal strategy is applied to an empirical data set, and results are compared with bootstrap confidence intervals.  相似文献   

8.
Several methods have been developed for the analysis of a mixture of qualitative and quantitative variables, and one, called PCAMIX, includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. The present paper proposes several techniques for simple structure rotation of a PCAMIX solution based on the rotation of component scores and indicates how these can be viewed as generalizations of the simple structure methods for PCA. In addition, a recently developed technique for the analysis of mixtures of qualitative and quantitative variables, called INDOMIX, is shown to construct component scores (without rotational freedom) maximizing the quartimax criterion over all possible sets of component scores. A numerical example is used to illustrate the implication that when used for qualitative variables, INDOMIX provides axes that discriminate between the observation units better than do those generated from MCA.The Netherlands organization for scientific research (NWO) is gratefully acknowledged for funding this project. This research was conducted while the author was supported by a PSYCHON-grant (560-267-011) from this organization. The author is obliged to Jos ten Berge for his comments on an earlier version.  相似文献   

9.
Exploratory factor analysis is a popular statistical technique used in communication research. Although exploratory factor analysis (EFA) and principal components analysis (PCA) are different techniques, PCA is often employed incorrectly to reveal latent constructs (i.e., factors) of observed variables, which is the purpose of EFA. PCA is more appropriate for reducing measured variables into a smaller set of variables (i.e., components) by keeping as much variance as possible out of the total variance in the measured variables. Furthermore, the popular use of varimax rotation raises some concerns about the relationships among the factors that researchers claim to discover. This paper discusses the distinct purposes of PCA and EFA, using two data sets as examples to highlight the differences in results between these procedures, and also reviews the use of each technique in three major communication journals: Communication Monographs, Human Communication Research, and Communication Research.  相似文献   

10.
To deal with missing data that arise due to participant nonresponse or attrition, methodologists have recommended an “inclusive” strategy where a large set of auxiliary variables are used to inform the missing data process. In practice, the set of possible auxiliary variables is often too large. We propose using principal components analysis (PCA) to reduce the number of possible auxiliary variables to a manageable number. A series of Monte Carlo simulations compared the performance of the inclusive strategy with eight auxiliary variables (inclusive approach) to the PCA strategy using just one principal component derived from the eight original variables (PCA approach). We examined the influence of four independent variables: magnitude of correlations, rate of missing data, missing data mechanism, and sample size on parameter bias, root mean squared error, and confidence interval coverage. Results indicate that the PCA approach results in unbiased parameter estimates and potentially more accuracy than the inclusive approach. We conclude that using the PCA strategy to reduce the number of auxiliary variables is an effective and practical way to reap the benefits of the inclusive strategy in the presence of many possible auxiliary variables.  相似文献   

11.
Principal components analysis of sampled functions   总被引:3,自引:0,他引:3  
This paper describes a technique for principal components analysis of data consisting ofn functions each observed atp argument values. This problem arises particularly in the analysis of longitudinal data in which some behavior of a number of subjects is measured at a number of points in time. In such cases information about the behavior of one or more derivatives of the function being sampled can often be very useful, as for example in the analysis of growth or learning curves. It is shown that the use of derivative information is equivalent to a change of metric for the row space in classical principal components analysis. The reproducing kernel for the Hilbert space of functions plays a central role, and defines the best interpolating functions, which are generalized spline functions. An example is offered of how sensitivity to derivative information can reveal interesting aspects of the data.This research was supported by Grant PA 0320 to the second author by the Natural Sciences and Research Council of Canada. We are grateful to the reviewers of an earlier version and to J. B. Kruskal and S. Winsberg for many helpful comments concerning exposition.  相似文献   

12.
The use of principal components analysis (PCA) for the study of evoked-response data may be complicated by variations from one trial to another in the latency of underlying brain events. Such variation can come from either random intra-and intersubject variability or from the effects of independent variables that are manipulated between conditions. The effect of such variability is investigated by simulation of these latency-varying events and by analysis of evoked responses in a behavioral task, the Sternberg memory search task, which is well known to generate variation in the latency of brain events. The results of PCA of within-subjects differences in these two situations are plausibly related to underlying stages of information processing, and the technique may augment reaction time data by providing information on the time of occurrence as well as the duration of stages of information processing.  相似文献   

13.
Five hens, experienced in discrimination of two categories of multidimensional geometrical figures presented in fixed pairs in a simultaneous discrimination, were tested with familiar figures arranged as new pairs to assess the dependence of categorization performance on learned relational or configural cues. Test performance did not differ from training: relational or configural cues still influenced discrimination performance. It was suggested that – in accordance with exemplar theories – this influence depended on differences between pairs of probe exemplars that facilitate retrieval of learned category members. To test whether exemplar, feature or prototype theory was most suitable to explain categorization by chickens, the rates of pecking at exemplars were analysed using principal components analysis (PCA). The distribution of the exemplars' component loads on the single component obtained was examined in the light of the conditions dictated by the three types of theories on how representative category exemplars should be. The least constraining theory, i.e. the exemplar theory, was most suitable. Defining factors of classificatory behaviour are discussed with a special emphasis on the characteristics of category-defining stimulus attributes. Accepted after revision: 29 May 2001 Electronic Publication  相似文献   

14.
Analyzing Additional Variables in the Theory of Reasoned Action   总被引:1,自引:0,他引:1  
This study examined the convergent, discriminant, and predictive validity of several variables proposed to augment the theory of reasoned action (TRA), using both principal components analysis (PCA)/multiple regression and confirmatory factor analysis (CFA)/structural equation modeling (SEM) among a sample of the UK population regarding their intention to have a child. PCA revealed good convergent and discriminant validity for attitude vs. anticipated regret, subjective norm vs. moral norm vs. social relations, but not for intention vs. desire or perceived behavioral control. Multiple regression analyses showed that the additional variables predicted a significant increment in the variance in intention. CFA, however, showed moderate convergent validity and poor discriminant validity and the structural model comprised the 2 predictors from the TRA only.  相似文献   

15.
In the distance approach to nonlinear multivariate data analysis the focus is on the optimal representation of the relationships between the objects in the analysis. In this paper two methods are presented for including weights in distance-based nonlinear multivariate data analysis. In the first method, weights are assigned to the objects while the second method is concerned with differential weighting of groups of variables. When each analysis variable defines a group the latter method becomes a variable weighting method. For objects the weights are assumed to be given; for groups of variables they may be given, or estimated. These weighting schemes can also be combined and have several important applications. For example, they make it possible to perform efficient analyses of large data sets, to use the distance-based variety of nonlinear multivariate data analysis as an addition to loglinear analysis of multiway contingency tables, and to do stability studies of the solutions by applying the bootstrap on the objects or the variables in the analysis. These and other applications are discussed, and an efficient algorithm is proposed to minimize the corresponding loss function.This study is funded by The Netherlands Organization for Scientific Research (NWO) by grant nr. 030-56403 for the PIONEER project Subject Oriented Multivariate Analysis to the third author.  相似文献   

16.
This paper extends the biplot technique to canonical correlation analysis and redundancy analysis. The plot of structure correlations is shown to the optimal for displaying the pairwise correlations between the variables of the one set and those of the second. The link between multivariate regression and canonical correlation analysis/redundancy analysis is exploited for producing an optimal biplot that displays a matrix of regression coefficients. This plot can be made from the canonical weights of the predictors and the structure correlations of the criterion variables. An example is used to show how the proposed biplots may be interpreted.  相似文献   

17.
A unified treatment of the weighting problem   总被引:1,自引:0,他引:1  
A general procedure is described for obtaining weighted linear combinations of variables. This includes as special cases, multiple regression weights, canonical variate analysis, principal components, maximizing composite reliability, canonical factor analysis, and certain other well-known methods. The general procedure is shown to yield certain desirable invariance properties, with respect to transformations of the variables.The author wishes to thank Dr. A. J. Cropley for preparing the necessary computer programs for this study.  相似文献   

18.
Parallel analysis (PA) is an often-recommended approach for assessment of the dimensionality of a variable set. PA is known in different variants, which may yield different dimensionality indications. In this article, the authors considered the most appropriate PA procedure to assess the number of common factors underlying ordered polytomously scored variables. They proposed minimum rank factor analysis (MRFA) as an extraction method, rather than the currently applied principal component analysis (PCA) and principal axes factoring. A simulation study, based on data with major and minor factors, showed that all procedures consistently point at the number of major common factors. A polychoric-based PA slightly outperformed a Pearson-based PA, but convergence problems may hamper its empirical application. In empirical practice, PA-MRFA with a 95% threshold based on polychoric correlations or, in case of nonconvergence, Pearson correlations with mean thresholds appear to be a good choice for identification of the number of common factors. PA-MRFA is a common-factor-based method and performed best in the simulation experiment. PA based on PCA with a 95% threshold is second best, as this method showed good performances in the empirically relevant conditions of the simulation experiment.  相似文献   

19.
In many human movement studies angle-time series data on several groups of individuals are measured. Current methods to compare groups include comparisons of the mean value in each group or use multivariate techniques such as principal components analysis and perform tests on the principal component scores. Such methods have been useful, though discard a large amount of information. Functional data analysis (FDA) is an emerging statistical analysis technique in human movement research which treats the angle-time series data as a function rather than a series of discrete measurements. This approach retains all of the information in the data. Functional principal components analysis (FPCA) is an extension of multivariate principal components analysis which examines the variability of a sample of curves and has been used to examine differences in movement patterns of several groups of individuals. Currently the functional principal components (FPCs) for each group are either determined separately (yielding components that are group-specific), or by combining the data for all groups and determining the FPCs of the combined data (yielding components that summarize the entire data set). The group-specific FPCs contain both within and between group variation and issues arise when comparing FPCs across groups when the order of the FPCs alter in each group. The FPCs of the combined data may not adequately describe all groups of individuals and comparisons between groups typically use t-tests of the mean FPC scores in each group. When these differences are statistically non-significant it can be difficult to determine how a particular intervention is affecting movement patterns or how injured subjects differ from controls. In this paper we aim to perform FPCA in a manner allowing sensible comparisons between groups of curves. A statistical technique called common functional principal components analysis (CFPCA) is implemented. CFPCA identifies the common sources of variation evident across groups but allows the order of each component to change for a particular group. This allows for the direct comparison of components across groups. We use our method to analyze a biomechanical data set examining the mechanisms of chronic Achilles tendon injury and the functional effects of orthoses.  相似文献   

20.
Principal component analysis (PCA) and common factor analysis are often used to model latent data structures. Typically, such analyses assume a single population whose correlation or covariance matrix is modelled. However, data may sometimes be unwittingly sampled from mixed populations containing a taxon (nonarbitrary subpopulation) and its complement class. One derives relations between values of PCA parameters within subpopulations and their values in the mixed population. These results are then extended to factor analysis in mixed populations. As relationships between subpopulation and mixed-population principal components and factors sensitively depend on within-subpopulation structures and between-subpopulation differences, naive interpretation of PCA or factor analytic findings can potentially mislead. Several analyses, better suited to the dimensional analysis of admixture data structures, are presented and compared.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号