首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).  相似文献   

2.
The clustering of two-mode proximity matrices is a challenging combinatorial optimization problem that has important applications in the quantitative social sciences. We focus on one particular type of problem related to the clustering of a two-mode binary matrix, which is relevant to the establishment of generalized blockmodels for social networks. In this context, clusters for the rows of the two-mode matrix intersect with clusters of the columns to form blocks, which should ideally be either complete (all 1s) or null (all 0s). A new procedure based on variable neighborhood search is presented and compared to an existing two-mode K-means clustering algorithm. The new procedure generally provided slightly greater explained variation; however, both methods yielded exceptional recovery of cluster structure.  相似文献   

3.
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.  相似文献   

4.
Functional magnetic reasonance imaging (fMRI) plays an important role in pre-surgical planning for patients with resectable brain lesions such as tumors. With appropriately designed tasks, the results of fMRI studies can guide resection, thereby preserving vital brain tissue. The mass univariate approach to fMRI data analysis consists of performing a statistical test in each voxel, which is used to classify voxels as either active or inactive—that is, related, or not, to the task of interest. In cognitive neuroscience, the focus is on controlling the rate of false positives while accounting for the severe multiple testing problem of searching the brain for activations. However, stringent control of false positives is accompanied by a risk of false negatives, which can be detrimental, particularly in clinical settings where false negatives may lead to surgical resection of vital brain tissue. Consequently, for clinical applications, we argue for a testing procedure with a stronger focus on preventing false negatives. We present a thresholding procedure that incorporates information on false positives and false negatives. We combine two measures of significance for each voxel: a classical p-value, which reflects evidence against the null hypothesis of no activation, and an alternative p-value, which reflects evidence against activation of a prespecified size. This results in a layered statistical map for the brain. One layer marks voxels exhibiting strong evidence against the traditional null hypothesis, while a second layer marks voxels where activation cannot be confidently excluded. The third layer marks voxels where the presence of activation can be rejected.  相似文献   

5.
A comprehensive analysis of clustering techniques is presented in this paper through their application to data on meteorological conditions. Six partitional and hierarchical clustering techniques (k-means, k-medoids, SOM k-means, Agglomerative Hierarchical Clustering, and Clustering based on Gaussian Mixture Models) with different distance criteria, together with some clustering evaluation measures (Calinski–Harabasz, Davies–Bouldin, Gap and Silhouette criterion clustering evaluation object), present various analyses of the main climatic zones in Spain. Real-life data sets, recorded by AEMET (Spanish Meteorological Agency) at four of its weather stations, are analyzed in order to characterize the actual weather conditions at each location. The clustering techniques process the data on some of the main daily meteorological variables collected at these stations over six years between 2004 and 2010.  相似文献   

6.
Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.Acknowledgements: Sara Dickson, Vidya Nair, and Beth Means assisted with the neural network analyses.  相似文献   

7.
A common representation of data within the context of multidimensional scaling (MDS) is a collection of symmetric proximity (similarity or dissimilarity) matrices for each of M subjects. There are a number of possible alternatives for analyzing these data, which include: (a) conducting an MDS analysis on a single matrix obtained by pooling (averaging) the M subject matrices, (b) fitting a separate MDS structure for each of the M matrices, or (c) employing an individual differences MDS model. We discuss each of these approaches, and subsequently propose a straightforward new method (CONcordance PARtitioning—ConPar), which can be used to identify groups of individual-subject matrices with concordant proximity structures. This method collapses the three-way data into a subject×subject dissimilarity matrix, which is subsequently clustered using a branch-and-bound algorithm that minimizes partition diameter. Extensive Monte Carlo testing revealed that, when compared to K-means clustering of the proximity data, ConPar generally provided better recovery of the true subject cluster memberships. A demonstration using empirical three-way data is also provided to illustrate the efficacy of the proposed method.  相似文献   

8.
Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.  相似文献   

9.
Decision making can be a complex process requiring the integration of several attributes of choice options. Understanding the neural processes underlying (uncertain) investment decisions is an important topic in neuroeconomics. We analyzed functional magnetic resonance imaging (fMRI) data from an investment decision study for stimulus-related effects. We propose a new technique for identifying activated brain regions: cluster, estimation, activation, and decision method. Our analysis is focused on clusters of voxels rather than voxel units. Thus, we achieve a higher signal-to-noise ratio within the unit tested and a smaller number of hypothesis tests compared with the often used General Linear Model (GLM). We propose to first conduct the brain parcellation by applying spatially constrained spectral clustering. The information within each cluster can then be extracted by the flexible dynamic semiparametric factor model (DSFM) dimension reduction technique and finally be tested for differences in activation between conditions. This sequence of Cluster, Estimation, Activation, and Decision admits a model-free analysis of the local fMRI signal. Applying a GLM on the DSFM-based time series resulted in a significant correlation between the risk of choice options and changes in fMRI signal in the anterior insula and dorsomedial prefrontal cortex. Additionally, individual differences in decision-related reactions within the DSFM time series predicted individual differences in risk attitudes as modeled with the framework of the mean-variance model.  相似文献   

10.
Given that a minor condition holds (e.g., the number of variables is greater than the number of clusters), a nontrivial lower bound for the sum-of-squares error criterion in K-means clustering is derived. By calculating the lower bound for several different situations, a method is developed to determine the adequacy of cluster solution based on the observed sum-of-squares error as compared to the minimum sum-of-squares error. The author was partially supported by the Office of Naval Research Grant #N00014-06-0106.  相似文献   

11.
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which k-means is applied twice to partition the object scores of respondents and the weights of variable categories. In this way, joint clusters that relate a subgroup of respondents exclusively to a subset of variable categories are obtained. The proposed method provides a low-dimensional map of displaying variable category points and the centroids of joint clusters simultaneously. In addition, it offers joint-cluster memberships of variable categories as well as respondents. A Monte Carlo study is conducted to assess the parameter recovery capability of the proposed method based on synthetic data. An empirical application concerning Korean consumers' preferences toward various underwear brands and attributes is presented to demonstrate the effectiveness of the proposed method as compared with 2 relevant extant approaches.  相似文献   

12.
Cluster Analysis for Cognitive Diagnosis: Theory and Applications   总被引:3,自引:0,他引:3  
Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.  相似文献   

13.
This paper proposes an order-constrained K-means cluster analysis strategy, and implements that strategy through an auxiliary quadratic assignment optimization heuristic that identifies an initial object order. A subsequent dynamic programming recursion is applied to optimally subdivide the object set subject to the order constraint. We show that although the usual K-means sum-of-squared-error criterion is not guaranteed to be minimal, a true underlying cluster structure may be more accurately recovered. Also, substantive interpretability seems generally improved when constrained solutions are considered. We illustrate the procedure with several data sets from the literature.  相似文献   

14.
This article proposes a new, more efficient method to compute the minus two log likelihood, its gradient, and the Hessian for structural equation models (SEMs) in reticular action model (RAM) notation. The method exploits the beneficial aspect of RAM notation that the matrix derivatives used in RAM are sparse. For an SEM with K variables, P parameters, and P′ entries in the symmetrical or asymmetrical matrix of the RAM notation filled with parameters, the asymptotical run time of the algorithm is O(P?′?K 2?+?P 2 K 2?+?K 3). The naive implementation and numerical implementations are both O(P 2 K 3), so that for typical applications of SEM, the proposed algorithm is asymptotically K times faster than the best previously known algorithm. A simulation comparison with a numerical algorithm shows that the asymptotical efficiency is transferred to an applied computational advantage that is crucial for the application of maximum likelihood estimation, even in small, but especially in moderate or large, SEMs.  相似文献   

15.
A new nonmetric multidimensional scaling method is devised to analyze three-way data concerning inter-stimulus similarities obtained from many subjects. It is assumed that subjects are classified into a small number of clusters and that the stimulus configuration is specific to each cluster. Under this assumption, the classification of subjects and the scaling used to derive the configurations for clusters are simultaneously performed using an alternating least-squares algorithm. The monotone regression of ordinal similarity data, the scaling of stimuli and the K -means clustering of subjects are iterated in the algorithm. The method is assessed using a simulation and its practical use is illustrated with the analysis of real data. Finally, some extensions are considered.  相似文献   

16.
Attributions are constantly assigned in everyday life. A well-known phenomenon is the self-serving bias: that is, people’s tendency to attribute positive events to internal causes (themselves) and negative events to external causes (other persons/circumstances). Here, we investigated the neural correlates of the cognitive processes implicated in self-serving attributions using social situations that differed in their emotional saliences. We administered an attributional bias task during fMRI scanning in a large sample of healthy subjects (n = 71). Eighty sentences describing positive or negative social situations were presented, and subjects decided via buttonpress whether the situation had been caused by themselves or by the other person involved. Comparing positive with negative sentences revealed activations of the bilateral posterior cingulate cortex (PCC). Self-attribution correlated with activation of the posterior portion of the precuneus. However, self-attributed positive versus negative sentences showed activation of the anterior portion of the precuneus, and self-attributed negative versus positive sentences demonstrated activation of the bilateral insular cortex. All significant activations were reported with a statistical threshold of p ≤ .001, uncorrected. In addition, a comparison of our fMRI task with data from the Internal, Personal and Situational Attributions Questionnaire, Revised German Version, demonstrated convergent validity. Our findings suggest that the precuneus and the PCC are involved in the evaluation of social events with particular regional specificities: The PCC is activated during emotional evaluation, the posterior precuneus during attributional evaluation, and the anterior precuneus during self-serving processes. Furthermore, we assume that insula activation is a correlate of awareness of personal agency in negative situations.  相似文献   

17.
18.
Milligan  Glenn W. 《Psychometrika》1980,45(3):325-342
An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start theK-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.  相似文献   

19.
The present study examined moderating effects of impulsivity on the relationships between promotive factors from family (family warmth, parental knowledge), school (school connectedness), and neighborhood (neighborhood cohesion) contexts with delinquency using data collected from N?=?2,978 sixth to eighth graders from 16 schools surrounding a major city in the Midwestern United States. More than half of the respondents were non-Caucasian (M age ?=?12.48; 41.0 % male). Multilevel modeling analyses were conducted to take into account the clustering of the participants within schools. Impulsivity was positively associated with adolescent delinquency. Additionally, family warmth, parental knowledge, and school connectedness, but not neighborhood cohesion, were independently and inversely related to adolescent delinquency. Finally, impulsivity moderated relationships between family warmth and parental knowledge with delinquency but not relationships between school attachment and neighborhood cohesion with delinquency. Specifically, the negative relationship between family warmth and delinquency was significant for adolescents with high levels of, but not for those with below-average levels of, impulsivity. In addition, parental knowledge had a stronger association with decreased levels of delinquency for adolescents reporting higher levels of impulsivity. The moderating effects of impulsivity did not differ for males and females or for minority and non-minority participants. Findings indicate that impulsivity may have greater impact on adolescents’ susceptibility to positive family influences than on their susceptibility to promotive factors from school or neighborhood contexts. Implications for future research and practice are discussed.  相似文献   

20.
This paper synthesizes the results, methodology, and research conducted concerning the K‐means clustering method over the last fifty years. The K‐means method is first introduced, various formulations of the minimum variance loss function and alternative loss functions within the same class are outlined, and different methods of choosing the number of clusters and initialization, variable preprocessing, and data reduction schemes are discussed. Theoretic statistical results are provided and various extensions of K‐means using different metrics or modifications of the original algorithm are given, leading to a unifying treatment of K‐means and some of its extensions. Finally, several future studies are outlined that could enhance the understanding of numerous subtleties affecting the performance of the K‐means method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号