首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.  相似文献   

2.
Brain activation detection is an important problem in fMRI data analysis. In this paper, we propose a data-driven activation detection method called neighborhood one-class SVM (NOC-SVM). Based on the probability distribution assumption of the one-class SVM algorithm and the neighborhood consistency hypothesis, NOC-SVM identifies a voxel as either an activated or non-activated voxel by a weighted distance between its near neighbors and a hyperplane in a high-dimensional kernel space. The proposed NOC-SVM are evaluated by using both synthetic and real datasets. On two synthetic datasets with different SNRs, NOC-SVM performs better than K-means and fuzzy K-means clustering and is comparable to POM. On a real fMRI dataset, NOC-SVM can discover activated regions similar to K-means and fuzzy K-means. These results show that the proposed algorithm is an effective activation detection method for fMRI data analysis. Furthermore, it is stabler than K-means and fuzzy K-means clustering.  相似文献   

3.
The clustering of two-mode proximity matrices is a challenging combinatorial optimization problem that has important applications in the quantitative social sciences. We focus on one particular type of problem related to the clustering of a two-mode binary matrix, which is relevant to the establishment of generalized blockmodels for social networks. In this context, clusters for the rows of the two-mode matrix intersect with clusters of the columns to form blocks, which should ideally be either complete (all 1s) or null (all 0s). A new procedure based on variable neighborhood search is presented and compared to an existing two-mode K-means clustering algorithm. The new procedure generally provided slightly greater explained variation; however, both methods yielded exceptional recovery of cluster structure.  相似文献   

4.
Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.Acknowledgements: Sara Dickson, Vidya Nair, and Beth Means assisted with the neural network analyses.  相似文献   

5.
The purpose of the current study was to build on the emerging effort to produce a meaningful typology of classroom behavior for elementary school age children. The Behavior Assessment System for Children (BASC) Teacher Rating Scales for Children (TRS-C) norming data were collected for 1,227 six- to eleven-year-old children at 116 sites representing various regions of the United States. The TRS-C has 148 items that are rated by the teacher on a 4-point scale of frequency, ranging from Never to Almost always. The Ward method of cluster analysis was used to identify the initial centroids or cluster seeds. An iterative clustering method, a K-means procedure, was used to refine the Ward cluster solution. A seven-cluster solution was selected based on both rational and empirical considerations. The resulting clusters were named well-adapted, average, learning disorder, disruptive behavior disorder, physical complaints and worry, severe psychopathology, and mildly disruptive. The seven-cluster solution resembles those of Achenbach (1991), Curry and Thompson (1985), and other researchers. The resulting typology points the way toward future cluster studies of child psychopathology by delineating additional research and theoretical questions.  相似文献   

6.
Given that a minor condition holds (e.g., the number of variables is greater than the number of clusters), a nontrivial lower bound for the sum-of-squares error criterion in K-means clustering is derived. By calculating the lower bound for several different situations, a method is developed to determine the adequacy of cluster solution based on the observed sum-of-squares error as compared to the minimum sum-of-squares error. The author was partially supported by the Office of Naval Research Grant #N00014-06-0106.  相似文献   

7.
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.  相似文献   

8.
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which k-means is applied twice to partition the object scores of respondents and the weights of variable categories. In this way, joint clusters that relate a subgroup of respondents exclusively to a subset of variable categories are obtained. The proposed method provides a low-dimensional map of displaying variable category points and the centroids of joint clusters simultaneously. In addition, it offers joint-cluster memberships of variable categories as well as respondents. A Monte Carlo study is conducted to assess the parameter recovery capability of the proposed method based on synthetic data. An empirical application concerning Korean consumers' preferences toward various underwear brands and attributes is presented to demonstrate the effectiveness of the proposed method as compared with 2 relevant extant approaches.  相似文献   

9.
Milligan  Glenn W. 《Psychometrika》1980,45(3):325-342
An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random seed points were used to start theK-means algorithms. However, two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust with respect to all of the types of error examined.  相似文献   

10.
A common representation of data within the context of multidimensional scaling (MDS) is a collection of symmetric proximity (similarity or dissimilarity) matrices for each of M subjects. There are a number of possible alternatives for analyzing these data, which include: (a) conducting an MDS analysis on a single matrix obtained by pooling (averaging) the M subject matrices, (b) fitting a separate MDS structure for each of the M matrices, or (c) employing an individual differences MDS model. We discuss each of these approaches, and subsequently propose a straightforward new method (CONcordance PARtitioning—ConPar), which can be used to identify groups of individual-subject matrices with concordant proximity structures. This method collapses the three-way data into a subject×subject dissimilarity matrix, which is subsequently clustered using a branch-and-bound algorithm that minimizes partition diameter. Extensive Monte Carlo testing revealed that, when compared to K-means clustering of the proximity data, ConPar generally provided better recovery of the true subject cluster memberships. A demonstration using empirical three-way data is also provided to illustrate the efficacy of the proposed method.  相似文献   

11.
We replicated Rosenblatt et al.'s (1998) cluster analysis of intake profiles of youth enrolled in a system of care program. The characteristics of a unique sample of 275 children and adolescents with emotional and behavioral disorders (E/BD) who participated in the Santa Barbara County Multiagency Integrated System of Care (MISC) program were examined. A two-step clustering procedure (hierarchical and K-means) was used to evaluate subtypes of youth who were opened to MISC after it had become a stable youth-service program. The results of the Rosenblatt et al. (1998) study were replicated with four identical clusters emerging: Troubled, Troubled and Troubling, Troubling, and At-Risk. Two additional clusters were differentiated: Moderate Troubled, and Moderate Troubled and Troubling. Comparisons across these six clusters show distinct profiles of youth with E/BD. Implications of these findings for developing appropriate service plans and for evaluating systems of care outcomes are discussed.  相似文献   

12.
Cluster differences scaling is a method for partitioning a set of objects into classes and simultaneously finding a low-dimensional spatial representation ofK cluster points, to model a given square table of dissimilarities amongn stimuli or objects. The least squares loss function of cluster differences scaling, originally defined only on the residuals of pairs of objects that are allocated to different clusters, is extended with a loss component for pairs that are allocated to the same cluster. It is shown that this extension makes the method equivalent to multidimensional scaling with cluster constraints on the coordinates. A decomposition of the sum of squared dissimilarities into contributions from several sources of variation is described, including the appropriate degrees of freedom for each source. After developing a convergent algorithm for fitting the cluster differences model, it is argued that the individual objects and the cluster locations can be jointly displayed in a configuration obtained as a by-product of the optimization. Finally, the paper introduces a fuzzy version of the loss function, which can be used in a successive approximation strategy for avoiding local minima. A simulation study demonstrates that this strategy significantly outperforms two other well-known initialization strategies, and that it has a success rate of 92 out of 100 in attaining the global minimum.  相似文献   

13.
A new nonmetric multidimensional scaling method is devised to analyze three-way data concerning inter-stimulus similarities obtained from many subjects. It is assumed that subjects are classified into a small number of clusters and that the stimulus configuration is specific to each cluster. Under this assumption, the classification of subjects and the scaling used to derive the configurations for clusters are simultaneously performed using an alternating least-squares algorithm. The monotone regression of ordinal similarity data, the scaling of stimuli and the K -means clustering of subjects are iterated in the algorithm. The method is assessed using a simulation and its practical use is illustrated with the analysis of real data. Finally, some extensions are considered.  相似文献   

14.
This paper synthesizes the results, methodology, and research conducted concerning the K‐means clustering method over the last fifty years. The K‐means method is first introduced, various formulations of the minimum variance loss function and alternative loss functions within the same class are outlined, and different methods of choosing the number of clusters and initialization, variable preprocessing, and data reduction schemes are discussed. Theoretic statistical results are provided and various extensions of K‐means using different metrics or modifications of the original algorithm are given, leading to a unifying treatment of K‐means and some of its extensions. Finally, several future studies are outlined that could enhance the understanding of numerous subtleties affecting the performance of the K‐means method.  相似文献   

15.
This paper proposes an order-constrained K-means cluster analysis strategy, and implements that strategy through an auxiliary quadratic assignment optimization heuristic that identifies an initial object order. A subsequent dynamic programming recursion is applied to optimally subdivide the object set subject to the order constraint. We show that although the usual K-means sum-of-squared-error criterion is not guaranteed to be minimal, a true underlying cluster structure may be more accurately recovered. Also, substantive interpretability seems generally improved when constrained solutions are considered. We illustrate the procedure with several data sets from the literature.  相似文献   

16.
Mixture analysis is commonly used for clustering objects on the basis of multivariate data. When the data contain a large number of variables, regular mixture analysis may become problematic, because a large number of parameters need to be estimated for each cluster. To tackle this problem, the mixtures-of-factor-analyzers (MFA) model was proposed, which combines clustering with exploratory factor analysis. MFA model selection is rather intricate, as both the number of clusters and the number of underlying factors have to be determined. To this end, the Akaike (AIC) and Bayesian (BIC) information criteria are often used. AIC and BIC try to identify a model that optimally balances model fit and model complexity. In this article, the CHull (Ceulemans & Kiers, 2006) method, which also balances model fit and complexity, is presented as an interesting alternative model selection strategy for MFA. In an extensive simulation study, the performances of AIC, BIC, and CHull were compared. AIC performs poorly and systematically selects overly complex models, whereas BIC performs slightly better than CHull when considering the best model only. However, when taking model selection uncertainty into account by looking at the first three models retained, CHull outperforms BIC. This especially holds in more complex, and thus more realistic, situations (e.g., more clusters, factors, noise in the data, and overlap among clusters).  相似文献   

17.
Cluster Analysis for Cognitive Diagnosis: Theory and Applications   总被引:3,自引:0,他引:3  
Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.  相似文献   

18.
This paper develops a new procedure, called stability analysis, for K‐means clustering. Instead of ignoring local optima and only considering the best solution found, this procedure takes advantage of additional information from a K‐means cluster analysis. The information from the locally optimal solutions is collected in an object by object co‐occurrence matrix. The co‐occurrence matrix is clustered and subsequently reordered by a steepest ascent quadratic assignment procedure to aid visual interpretation of the multidimensional cluster structure. Subsequently, measures are developed to determine the overall structure of a data set, the number of clusters and the multidimensional relationships between the clusters.  相似文献   

19.
The minimum‐diameter partitioning problem (MDPP) seeks to produce compact clusters, as measured by an overall goodness‐of‐fit measure known as the partition diameter, which represents the maximum dissimilarity between any two objects placed in the same cluster. Complete‐linkage hierarchical clustering is perhaps the best‐known heuristic method for the MDPP and has an extensive history of applications in psychological research. Unfortunately, this method has several inherent shortcomings that impede the model selection process, such as: (1) sensitivity to the input order of the objects, (2) failure to obtain a globally optimal minimum‐diameter partition when cutting the tree at K clusters, and (3) the propensity for a large number of alternative minimum‐diameter partitions for a given K. We propose that each of these problems can be addressed by applying an algorithm that finds all of the minimum‐diameter partitions for different values of K. Model selection is then facilitated by considering, for each value of K, the reduction in the partition diameter, the number of alternative optima, and the partition agreement among the alternative optima. Using five examples from the empirical literature, we show the practical value of the proposed process for facilitating model selection for the MDPP.  相似文献   

20.
A comprehensive analysis of clustering techniques is presented in this paper through their application to data on meteorological conditions. Six partitional and hierarchical clustering techniques (k-means, k-medoids, SOM k-means, Agglomerative Hierarchical Clustering, and Clustering based on Gaussian Mixture Models) with different distance criteria, together with some clustering evaluation measures (Calinski–Harabasz, Davies–Bouldin, Gap and Silhouette criterion clustering evaluation object), present various analyses of the main climatic zones in Spain. Real-life data sets, recorded by AEMET (Spanish Meteorological Agency) at four of its weather stations, are analyzed in order to characterize the actual weather conditions at each location. The clustering techniques process the data on some of the main daily meteorological variables collected at these stations over six years between 2004 and 2010.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号