首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison, 2002). Focus is given to the multivariate normal distribution, and 9 separate decompositions (i.e., class structures) of the covariance matrix are investigated. To provide a link to the current literature, comparisons are made with K-means clustering in 3 detailed Monte Carlo studies. The findings have implications for applied researchers in that mixture-model clustering techniques performed best when the covariance structure and number of clusters were known. However, as the information about the shape and number of clusters became unknown, degraded performance was observed for both K-means clustering and mixture-model clustering.  相似文献   

2.
Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple clusters, the procedure is extended to determine the number of clusters. Through a series of simulations, the proposed methodology is shown to outperform several other commonly used procedures for determining both the presence of clusters and their number.  相似文献   

3.
Brusco MJ 《心理学方法》2004,9(4):510-523
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the true cluster structure. The author presents a heuristic procedure that selects an appropriate subset from among the set of all candidate clustering variables. Specifically, this procedure attempts to select only those variables that contribute to the definition of true cluster structure while eliminating variables that can hide (or mask) that true structure. Experimental testing of the proposed variable-selection procedure reveals that it is extremely successful at accomplishing this goal.  相似文献   

4.
A split-sample replication stopping rule for hierarchical cluster analysis is compared with the internal criterion previously found superior by Milligan and Cooper (1985) in their comparison of 30 different procedures. The number and extent of overlap of the latent population distributions was systematically varied in the present evaluation of stopping-rule validity. Equal and unequal population base rates were also considered. Both stopping rules correctly identified the actual number of populations when there was essentially no overlap and clusters occupied visually distinct regions of the measurement space. The replication criterion, which is evaluated by clustering of cluster means from preliminary analyses that are accomplished on random partitions of an original data set, was superior as the degree of overlap in population distributions increased. Neither method performed adequately when overlap obliterated visually discernible density nodes.This research was supported in part by NIMH grant 5R01 MH 32457 14.  相似文献   

5.
In many areas of psychology, one is interested in disclosing the underlying structural mechanisms that generated an object by variable data set. Often, based on theoretical or empirical arguments, it may be expected that these underlying mechanisms imply that the objects are grouped into clusters that are allowed to overlap (i.e., an object may belong to more than one cluster). In such cases, analyzing the data with Mirkin’s additive profile clustering model may be appropriate. In this model: (1) each object may belong to no, one or several clusters, (2) there is a specific variable profile associated with each cluster, and (3) the scores of the objects on the variables can be reconstructed by adding the cluster-specific variable profiles of the clusters the object in question belongs to. Until now, however, no software program has been publicly available to perform an additive profile clustering analysis. For this purpose, in this article, the ADPROCLUS program, steered by a graphical user interface, is presented. We further illustrate its use by means of the analysis of a patient by symptom data matrix.  相似文献   

6.
Vermunt JK 《心理学方法》2011,16(1):82-8; discussion 89-92
Steinley and Brusco (2011) presented the results of a huge simulation study aimed at evaluating cluster recovery of mixture model clustering (MMC) both for the situation where the number of clusters is known and is unknown. They derived rather strong conclusions on the basis of this study, especially with regard to the good performance of K-means (KM) compared with MMC. I agree with the authors' conclusion that the performance of KM may be equal to MMC in certain situations, which are primarily the situations investigated by Steinley and Brusco. However, a weakness of the paper is the failure to investigate many important real-world situations where theory suggests that MMC should outperform KM. This article elaborates on the KM-MMC comparison in terms of cluster recovery and provides some additional simulation results that show that KM may be much worse than MMC. Moreover, I show that KM is equivalent to a restricted mixture model estimated by maximizing the classification likelihood and comment on Steinley and Brusco's recommendation regarding the use of mixture models for clustering.  相似文献   

7.
Given that a minor condition holds (e.g., the number of variables is greater than the number of clusters), a nontrivial lower bound for the sum-of-squares error criterion in K-means clustering is derived. By calculating the lower bound for several different situations, a method is developed to determine the adequacy of cluster solution based on the observed sum-of-squares error as compared to the minimum sum-of-squares error. The author was partially supported by the Office of Naval Research Grant #N00014-06-0106.  相似文献   

8.
The validity of a Personality Inventory for Children-Revised edition (PIC-R) typology was examined in a sample of 323 children aged 6–16 years. These children had been referred to a children's mental health centre for neuropsychological assessment. In study 1, K-means cluster analysis (k = 12) was applied to the PIC clinical scales in an attempt to replicate the 12 clusters identified by Gdowski, Lachar, and Kline (1985). Partial cluster replication was achieved. Examination of the obtained clusters revealed significant overlap, suggesting that fewer clusters would represent an optimal solution. In study 2, a two-stage cluster analysis yielded a seven-cluster solution consistent with several key forms of psychopathology previously reported in the literature using specific neuropsychological populations. Identified subtypes included profiles characterized as: normal, cognitive deficit, cognitive deficit with internalized psychopathology, cognitive deficit with social impairment, cognitive deficit with hyperactivity, cognitive deficit with both internalized and externalized psychopathology, and combined internalized and externalized psychopathology without a cognitive deficit component.  相似文献   

9.
Recent cluster analytic research with alcoholic inpatients has demonstrated the existence of several Millon Clinical Multiaxial Inventory (MCMI) clusters that appear to be consistent across different subject samples. The validity of these data would be strengthened by a statistical demonstration of the similarity of attained clusters across studies--a demonstration of concordance of subject classification across different clustering techniques on the same data set- and the inclusion of external, independent measures against which to evaluate the predictive validity of the cluster typology. We found a high level of concordance in subject classification across different clustering methods on the same data set and a high level of agreement with cluster typologies attained in previous studies. Subsequent multivariate analyses employing independent scales measuring various aspects of alcohol use confirmed differences among cluster members on perceived benefits of alcohol use and deleterious effects of alcohol use. The prominent differences in alcohol use along with a rationale for their development are discussed.  相似文献   

10.
The validity of a Personality Inventory for Children-Revised edition (PIC-R) typology was examined in a sample of 323 children aged 6-16 years. These children had been referred to a children's mental health centre for neuropsychological assessment. In study 1, K-means cluster analysis (k = 12) was applied to the PIC clinical scales in an attempt to replicate the 12 clusters identified by Gdowski, Lachar, and Kline (1985). Partial cluster replication was achieved. Examination of the obtained clusters revealed significant overlap, suggesting that fewer clusters would represent an optimal solution. In study 2, a two-stage cluster analysis yielded a seven-cluster solution consistent with several key forms of psychopathology previously reported in the literature using specific neuropsychological populations. Identified subtypes included profiles characterized as: normal, cognitive deficit, cognitive deficit with internalized psychopathology, cognitive deficit with social impairment, cognitive deficit with hyperactivity, cognitive deficit with both internalized and externalized psychopathology, and combined internalized and externalized psychopathology without a cognitive deficit component.  相似文献   

11.
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).  相似文献   

12.
Mixture analysis is commonly used for clustering objects on the basis of multivariate data. When the data contain a large number of variables, regular mixture analysis may become problematic, because a large number of parameters need to be estimated for each cluster. To tackle this problem, the mixtures-of-factor-analyzers (MFA) model was proposed, which combines clustering with exploratory factor analysis. MFA model selection is rather intricate, as both the number of clusters and the number of underlying factors have to be determined. To this end, the Akaike (AIC) and Bayesian (BIC) information criteria are often used. AIC and BIC try to identify a model that optimally balances model fit and model complexity. In this article, the CHull (Ceulemans & Kiers, 2006) method, which also balances model fit and complexity, is presented as an interesting alternative model selection strategy for MFA. In an extensive simulation study, the performances of AIC, BIC, and CHull were compared. AIC performs poorly and systematically selects overly complex models, whereas BIC performs slightly better than CHull when considering the best model only. However, when taking model selection uncertainty into account by looking at the first three models retained, CHull outperforms BIC. This especially holds in more complex, and thus more realistic, situations (e.g., more clusters, factors, noise in the data, and overlap among clusters).  相似文献   

13.
Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.Acknowledgements: Sara Dickson, Vidya Nair, and Beth Means assisted with the neural network analyses.  相似文献   

14.
Four experiments are reported that demonstrate the benefits of clustering by spatial proximity in spatial serial recall and provide support for the notion that hierarchical coding underpins the retention of clustered sequences in spatial working memory. Sequences segregated by spatial clusters increased serial recall performance at different levels of sequence length in a variation of the Corsi test and produced a faster initial response time (RT), which indicates that they afforded data reducing processes. RT at cluster boundary increased in parallel with the number of items forming the clusters, suggesting that subroutines of different length were responsible for the ordering of items within clusters of different size. Evidence for hierarchical coding was also obtained in a serial recognition task, indicating this type of representation pertains to the retention of the sequences rather than exclusively to the organisation of the motor plan for the reproduction of the sequences.  相似文献   

15.
Summary Acute (n=179) and chronic (n=113) aphasic populations were studied by numerical taxonomy. Clustering based on the objective and standardized language scores of the WAB yielded significant differences for the acute and chronic groups. The overlap between objective clustering and clinical typology, also based on test scores, was sufficient to allow us to interpret the data in clinical terms. The decrease of the global cluster and of Wernicke's cluster and the appearance of new clusters, such as the mixed global-Broca's, the partially recovered Broca's — mild anomic, and the recovered Wernicke's-semantic groups, and the further dichotomy of afferent and efferent conduction clusters were observed. Recovering patients are often reclassified as anomics; this changes the anomic clusters, loading one with patients who fall into the recovered category. Chronic clusters appeared more distinct with less overlap. The trends in both data sets were investigated by principal components analysis. This showed that all language scores contributed to the first component in both populations fairly evenly. Therefore, the main contribution to the first component was severity (the combination of scores). Comprehension and fluency were the major contributors to the second root in both populations, indicating the diagnostic significance of these parameters. Repetition featured more prominently in the second root of the acute, than the chronic population.  相似文献   

16.
A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.The authors would like to express their appreciation to a number of individuals who provided assistance during the conduct of this research. Those who deserve recognition include Roger Blashfield, John Crawford, John Gower, James Lingoes, Wansoo Rhee, F. James Rohlf, Warren Sarle, and Tom Soon.  相似文献   

17.
职务评价的方差分析法在制定职务等级中的应用   总被引:1,自引:0,他引:1  
职务评价的方差分析法的技术核心是通过方差分析确定职务可比价值各成分的权重。此方法的有效性和实用性已在协助国有企业内部分配改革的制定工资标准中得到验证,并显示其普遍性的方法论意义:职务评价值是职务价值的线性映射:凡涉及职务价值差异的问题都可用此方法解决。本研究应用方差分析法为国有金融系统行员制改革中制定统一的职务等级标准提供技术支持和科学依据。评价程序中,职务分析、职务分类、计算职务评价值等前三个步骤与制定工资标准的作法相同。然后,推算各类职务评价值的变异范围(以95%置信区间代表),再将各类职务按管理层次合并为大类。最后,寻求一个合理划分各管理层次大类变异范围的约数,划分整个评价值变异范围。对某银行系统452种职务的841个样本评价的结果,所确定的职务等级数、各职务大类的等级跨度、位次,都符合该银行系统的人事管理经验及改革设想  相似文献   

18.
19.
郭磊  杨静  宋乃庆 《心理科学》2018,(3):735-742
聚类分析已成功用于认知诊断评估(CDA)中,使用广泛的聚类分析方法为K-means算法,有研究已证明K-means在CDA中具有较好的聚类效果。而谱聚类算法通常比K-means分类效果更佳,本研究将谱聚类算法引进CDA,探讨了属性层级结构、属性个数、样本量和失误率对该方法的影响。研究发现:(1)谱聚类算法要比K-means提供更好的聚类结果,尤其在实验条件较苛刻时,谱聚类算法更加稳健;(2)线型结构聚类效果最好,收敛型和发散型相近,独立型结构表现较差;(3)属性个数和失误率增加后,聚类效果会下降;(4)样本量增加后,聚类效果有所提升,但K-means方法有时会有反向结果出现。  相似文献   

20.
Additive clustering provides a conceptually simple and potentially powerful approach to modeling the similarity relationships between stimuli. The ability of additive clustering models to accommodate similarity data, however, typically arises through the incorporation of large numbers of parameterized clusters. Accordingly, for the purposes of both model generation and model comparison, it is necessary to develop quantitative evaluative measures of additive clustering models that take into account both data-fit and complexity. Using a previously developed probabilistic formulation of additive clustering, the Bayesian Information Criterion is proposed for this role, and its application demonstrated. Limitations inherent in this approach, including the assumption that model complexity is equivalent to cluster cardinality, are discussed. These limitations are addressed by applying the Laplacian approximation of a marginal probability density, from which a measure of cluster structure complexity is derived. Using this measure, a preliminary investigation is made of the various properties of cluster structures that affect additive clustering model complexity. Among other things, these investigations show that, for a fixed number of clusters, a model with a strictly nested cluster structure is the least complicated, while a model with a partitioning cluster structure is the most complicated. Copyright 2001 Academic Press.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号