首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which k-means is applied twice to partition the object scores of respondents and the weights of variable categories. In this way, joint clusters that relate a subgroup of respondents exclusively to a subset of variable categories are obtained. The proposed method provides a low-dimensional map of displaying variable category points and the centroids of joint clusters simultaneously. In addition, it offers joint-cluster memberships of variable categories as well as respondents. A Monte Carlo study is conducted to assess the parameter recovery capability of the proposed method based on synthetic data. An empirical application concerning Korean consumers' preferences toward various underwear brands and attributes is presented to demonstrate the effectiveness of the proposed method as compared with 2 relevant extant approaches.  相似文献   

2.
Milligan  Glenn W. 《Psychometrika》1980,45(3):325-342
An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start theK-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.  相似文献   

3.
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).  相似文献   

4.
A comprehensive analysis of clustering techniques is presented in this paper through their application to data on meteorological conditions. Six partitional and hierarchical clustering techniques (k-means, k-medoids, SOM k-means, Agglomerative Hierarchical Clustering, and Clustering based on Gaussian Mixture Models) with different distance criteria, together with some clustering evaluation measures (Calinski–Harabasz, Davies–Bouldin, Gap and Silhouette criterion clustering evaluation object), present various analyses of the main climatic zones in Spain. Real-life data sets, recorded by AEMET (Spanish Meteorological Agency) at four of its weather stations, are analyzed in order to characterize the actual weather conditions at each location. The clustering techniques process the data on some of the main daily meteorological variables collected at these stations over six years between 2004 and 2010.  相似文献   

5.
Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.Acknowledgements: Sara Dickson, Vidya Nair, and Beth Means assisted with the neural network analyses.  相似文献   

6.
Brain activation detection is an important problem in fMRI data analysis. In this paper, we propose a data-driven activation detection method called neighborhood one-class SVM (NOC-SVM). Based on the probability distribution assumption of the one-class SVM algorithm and the neighborhood consistency hypothesis, NOC-SVM identifies a voxel as either an activated or non-activated voxel by a weighted distance between its near neighbors and a hyperplane in a high-dimensional kernel space. The proposed NOC-SVM are evaluated by using both synthetic and real datasets. On two synthetic datasets with different SNRs, NOC-SVM performs better than K-means and fuzzy K-means clustering and is comparable to POM. On a real fMRI dataset, NOC-SVM can discover activated regions similar to K-means and fuzzy K-means. These results show that the proposed algorithm is an effective activation detection method for fMRI data analysis. Furthermore, it is stabler than K-means and fuzzy K-means clustering.  相似文献   

7.
8.
In this paper we propose a latent class distance association model for clustering in the predictor space of large contingency tables with a categorical response variable. The rows of such a table are characterized as profiles of a set of explanatory variables, while the columns represent a single outcome variable. In many cases such tables are sparse, with many zero entries, which makes traditional models problematic. By clustering the row profiles into a few specific classes and representing these together with the categories of the response variable in a low‐dimensional Euclidean space using a distance association model, a parsimonious prediction model can be obtained. A generalized EM algorithm is proposed to estimate the model parameters and the adjusted Bayesian information criterion statistic is employed to test the number of mixture components and the dimensionality of the representation. An empirical example highlighting the advantages of the new approach and comparing it with traditional approaches is presented.  相似文献   

9.
This paper investigates thematic classification of homicides for the purpose of behavioural investigative analysis (e.g. offender profiling). Previous research has predominantly used smallest space analysis (SSA) to conceptualise and classify offences into thematic groups based on crime scene behaviour data. This paper introduces a combined approach utilising multiple correspondence analysis (MCA), cluster analysis (CA), and discriminant function analysis (DFA) to define and differentiate crime scenes into expressive or instrumental and impersonal or personal crimes. MCA is used to derive the latent structural dimensions in the crime data and produce quantitative scores for each offence along these dimensions. Two‐step CA was then utilised to classify offences. Offence dimensional scores were then used to predict cluster membership under DFA, producing cluster centroids corresponding to MCA dimensions. Centroids were plotted on the MCA correspondence map to simultaneously conceptualise crime classification and the latent structure of the Serbian crime data. Classification of offences based on MCA dimensional scores were 91.5% accurate. This MCA–CA–DFA approach may reduce some of the more subjective aspects of SSA methodology used in classification, whilst producing a product more amenable to objective and cumulative review. Implications for offender profiling research utilising SSA and this approach are discussed. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
Cluster Analysis for Cognitive Diagnosis: Theory and Applications   总被引:3,自引:0,他引:3  
Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.  相似文献   

11.
Clusteringn objects intok groups under optimal scaling of variables   总被引:1,自引:0,他引:1  
We propose a method to reduce many categorical variables to one variable withk categories, or stated otherwise, to classifyn objects intok groups. Objects are measured on a set of nominal, ordinal or numerical variables or any mix of these, and they are represented asn points inp-dimensional Euclidean space. Starting from homogeneity analysis, also called multiple correspondence analysis, the essential feature of our approach is that these object points are restricted to lie at only one ofk locations. It follows that thesek locations must be equal to the centroids of all objects belonging to the same group, which corresponds to a sum of squared distances clustering criterion. The problem is not only to estimate the group allocation, but also to obtain an optimal transformation of the data matrix. An alternating least squares algorithm and an example are given.The authors thank Eveline Kroezen and Teije Euverman for their comments on a previous draft of this paper.  相似文献   

12.
Clustering individuals by measures of similarity or dissimilarity at trajectories of changes in longitudinal data enables determination of typical patterns of development and growth. The present research proposes a new constrained k‐means method with lower bound constraints on cluster proportions and distances among clusters at focused variables and time points to fulfill various needs in clustering longitudinal data. The method assumes a large number of clusters at the onset and iteratively deletes and combines clusters according to these constraints. An additional property of the proposed constrained k‐means includes direct estimation of the unknown number of clusters. Simulation results clearly show the usefulness of the method for extracting clusters in plausible, real‐life analysis including non‐normality within clusters, and the proposed algorithm works well and convergence of the estimates is satisfactory. An actual example using Japanese longitudinal data regarding sleep habits and mental health is presented to verify the utility of the proposed constrained k‐means.  相似文献   

13.
The clustering of two-mode proximity matrices is a challenging combinatorial optimization problem that has important applications in the quantitative social sciences. We focus on one particular type of problem related to the clustering of a two-mode binary matrix, which is relevant to the establishment of generalized blockmodels for social networks. In this context, clusters for the rows of the two-mode matrix intersect with clusters of the columns to form blocks, which should ideally be either complete (all 1s) or null (all 0s). A new procedure based on variable neighborhood search is presented and compared to an existing two-mode K-means clustering algorithm. The new procedure generally provided slightly greater explained variation; however, both methods yielded exceptional recovery of cluster structure.  相似文献   

14.
Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.  相似文献   

15.
The emergence of Gaussian model‐based partitioning as a viable alternative to K‐means clustering fosters a need for discrete optimization methods that can be efficiently implemented using model‐based criteria. A variety of alternative partitioning criteria have been proposed for more general data conditions that permit elliptical clusters, different spatial orientations for the clusters, and unequal cluster sizes. Unfortunately, many of these partitioning criteria are computationally demanding, which makes the multiple‐restart (multistart) approach commonly used for K‐means partitioning less effective as a heuristic solution strategy. As an alternative, we propose an approach based on iterated local search (ILS), which has proved effective in previous combinatorial data analysis contexts. We compared multistart, ILS and hybrid multistart–ILS procedures for minimizing a very general model‐based criterion that assumes no restrictions on cluster size or within‐group covariance structure. This comparison, which used 23 data sets from the classification literature, revealed that the ILS and hybrid heuristics generally provided better criterion function values than the multistart approach when all three methods were constrained to the same 10‐min time limit. In many instances, these differences in criterion function values reflected profound differences in the partitions obtained.  相似文献   

16.
Given that a minor condition holds (e.g., the number of variables is greater than the number of clusters), a nontrivial lower bound for the sum-of-squares error criterion in K-means clustering is derived. By calculating the lower bound for several different situations, a method is developed to determine the adequacy of cluster solution based on the observed sum-of-squares error as compared to the minimum sum-of-squares error. The author was partially supported by the Office of Naval Research Grant #N00014-06-0106.  相似文献   

17.
Ak-dimensional multivariate normal distribution is made discrete by partitioning thek-dimensional Euclidean space with rectangular grids. The collection of probability integrals over the partitioned cubes is ak-dimensional contingency table with ordered categories. It is shown that loglinear model with main effects plus two-way interactions provides an accurate approximation for thek-dimensional table. The complete multivariate normal integral table is computed via the iterative proportional fitting algorithm from bivariate normal integral tables. This approach imposes no restriction on the correlation matrix. Comparisons with other numerical integration algorithms are reported. The approximation suggests association models for discretized multivariate normal distributions and contingency tables with ordered categories.The contingency-table approach occurred to me while I was collaborating with Paul Holland of the Educational Testing Service in 1985 on bivariate dependence functions. Holland maintains a belief that the continuous can learn from the discrete. This work is a reassertion of his claim.This research was sponsored by the National Science Council, Republic of China.  相似文献   

18.
We used cluster analysis to identify children’s coping profiles and to examine self- and parent-reported correlates of coping in a community sample. Participants included 135 children (M age = 11.27, s.d. = .59) recruited from local public elementary and junior high schools and 116 of their parents. Analyses included hierarchical cluster analysis (Ward’s method), followed by non-hierarchical (k-means) cluster analysis to confirm the cluster solution. Results yielded four clusters reflecting high, active, low, and indiscriminant patterns of coping strategies. Members of the active coping group self-reported the fewest symptoms of distress and the greatest number of prosocial competencies after controlling for social desirability. No differences emerged for parent-reported psychosocial functioning across coping profiles. Our results suggest that a combination of active coping strategies may be associated with better psychosocial functioning than a combination of active and avoidant coping strategies.  相似文献   

19.
Neural Network models are commonly used for cluster analysis in engineering, computational neuroscience, and the biological sciences, although they are rarely used in the social sciences. In this study we compare the classification capabilities of the 1-dimensional Kohonen neural network with two partitioning (Hartigan and Späthk-means) and three hierarchical (Ward's, complete linkage, and average linkage) cluster methods in 2,580 data sets with known cluster structure. Overall, the performance of the Kohonen networks was similar to, or better than, the performance of the other methods.  相似文献   

20.
In many areas of psychology, one is interested in disclosing the underlying structural mechanisms that generated an object by variable data set. Often, based on theoretical or empirical arguments, it may be expected that these underlying mechanisms imply that the objects are grouped into clusters that are allowed to overlap (i.e., an object may belong to more than one cluster). In such cases, analyzing the data with Mirkin’s additive profile clustering model may be appropriate. In this model: (1) each object may belong to no, one or several clusters, (2) there is a specific variable profile associated with each cluster, and (3) the scores of the objects on the variables can be reconstructed by adding the cluster-specific variable profiles of the clusters the object in question belongs to. Until now, however, no software program has been publicly available to perform an additive profile clustering analysis. For this purpose, in this article, the ADPROCLUS program, steered by a graphical user interface, is presented. We further illustrate its use by means of the analysis of a patient by symptom data matrix.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号