首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Properties of the Hubert-Arabie adjusted Rand index   总被引:1,自引:0,他引:1  
This article provides an investigation of cluster validation indices that relates 4 of the indices to the L. Hubert and P. Arabie (1985) adjusted Rand index--the cluster validation measure of choice (G. W. Milligan & M. C. Cooper, 1986). It is shown how these other indices can be "roughly" transformed into the same scale as the adjusted Rand index. Furthermore, in-depth explanations are given of why classification rates should not be used in cluster validation research. The article concludes by summarizing several properties of the adjusted Rand index across many conditions and provides a method for testing the significance of observed adjusted Rand indices.  相似文献   

2.
A macro for calculating the Hubert and Arabie (1985) adjusted Rand statistic is presented. The adjusted Rand statistic gives a measure of classification agreement between two partitions of the same set of objects. The macro is written in the SAS macro language and makes extensive use of SAS/IML software (SAS Institute, 1985a, 1985b). The macro uses two different methods of handling missing values. The default method assumes that each object that has a missing value for the classification category is in its own separate category or cluster for that classification. The optional method places all objects with a missing value for the classification category into the same category for that classification.This study was supported in part by Individual National Research Service Award F32 DA 05283 from the National Institute on Drug Abuse.Requests for the Macro code can be sent via BITNET: CUSGPXH @ UCLAMVS. A copy of the macrocode can also be obtained by sending a stamped self-addressed mailer and a PC-DOS formatted floppy diskette to Paul Hoffman, 5628 MSA, UCLA, Los Angeles, CA 90024-1557.  相似文献   

3.
Two expectations of the adjusted Rand index (ARI) are compared. It is shown that the expectation derived by Morey and Agresti (1984, Educational and Psychological Measurement, 44, 33) under the multinomial distribution to approximate the exact expectation from the hypergeometric distribution (Hubert & Arabie, 1985, Journal of Classification, 2, 193) provides a poor approximation, and, in some cases, the difference between the two expectations can increase with the sample size. Proofs concerning the minimum and maximum difference between the two expectations are provided, and it is shown through simulation that the ARI can differ significantly depending on which expectation is used. Furthermore, when compared in a hypothesis testing framework, multinomial approximation overly favours the null hypothesis.  相似文献   

4.
A variable-selection heuristic for K-means clustering   总被引:4,自引:0,他引:4  
One of the most vexing problems in cluster analysis is the selection and/or weighting of variables in order to include those that truly define cluster structure, while eliminating those that might mask such structure. This paper presents a variable-selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. The heuristic was subjected to Monte Carlo testing across more than 2200 datasets with known cluster structure. The results indicate the heuristic is extremely effective at eliminating masking variables. A cluster analysis of real-world financial services data revealed that using the variable-selection heuristic prior to the K-means algorithm resulted in greater cluster stability.  相似文献   

5.
There are various optimization strategies for approximating, through the minimization of a least-squares loss function, a given symmetric proximity matrix by a sum of matrices each subject to some collection of order constraints on its entries. We extend these approaches to include components in the approximating sum that satisfy what are called the strongly-anti-Robinson (SAR) or circular strongly-anti-Robinson (CSAR) restrictions. A matrix that is SAR or CSAR is representable by a particular graph-theoretic structure, where each matrix entry is reproducible from certain minimum path lengths in the graph. One published proximity matrix is used extensively to illustrate the types of approximation that ensue when the SAR or CSAR constraints are imposed.The authors are indebted to Boris Mirkin who first noted in a personal communication to one of us (LH, April 22, 1996) that the optimization method for fitting anti-Robinson matrices in Hubert and Arabie (1994) should be extendable to the fitting of strongly anti-Robinson matrices as well.  相似文献   

6.
Mapclus: A mathematical programming approach to fitting the adclus model   总被引:6,自引:0,他引:6  
We present a new algorithm, MAPCLUS (MAthematicalProgrammingCLUStering), for fitting the Shepard-Arabie ADCLUS (forADditiveCLUStering) model. MAPCLUS utilizes an alternating least squares method combined with a mathematical programming optimization procedure based on a penalty function approach, to impose discrete (0,1) constraints on parameters defining cluster membership. This procedure is supplemented by several other numerical techniques (notably a heuristically based combinatorial optimization procedure) to provide an efficient general-purpose computer implemented algorithm for obtaining ADCLUS representations. MAPCLUS is illustrated with an application to one of the examples given by Shepard and Arabie using the older ADCLUS procedure. The MAPCLUS solution uses half as many clusters to achieve nearly the same level of goodness-of-fit. Finally, we consider an extension of the present approach to fitting a three-way generalization of the ADCLUS model, called INDCLUS (INdividualDifferencesCLUStering).We are indebted to Scott A. Boorman, W. K. Estes, J. A. Hartigan, Lawrence J. Hubert, Carol L. Krumhansl, Joseph B. Kruskal, Sandra Pruzansky, Roger N. Shepard, Edward J. Shoben, Sigfrid D. Soli, and Amos Tversky for helpful discussions of this work, as well as the anonymous referees for their suggestions and corrections on an earlier version of this paper. We are also grateful to Pamela Baker and Dan C. Knutson for technical assistance. The research reported here was supported in part by LEAA Grant 78-NI-AX-0142 and NSF Grants SOC76-24512 and SOC76-24394.  相似文献   

7.
Steinley D 《心理学方法》2006,11(2):178-192
Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate distributions of clusters; and (e) various multidimensional data structures. The results are evaluated in terms of the Hubert-Arabie adjusted Rand index, and several observations concerning the performance of K-means clustering are made. Finally, the article concludes with the proposal of a diagnostic technique indicating when the partitioning given by a K-means cluster analysis can be trusted. By combining the information from several observable characteristics of the data (number of clusters, number of variables, sample size, etc.) with the prevalence of unique local optima in several thousand implementations of the K-means algorithm, the author provides a method capable of guiding key data-analysis decisions.  相似文献   

8.
This study examines the structural invariance of Holland's (1973, 1985) vocational interest model across gender. Evidence of gender differences in the fit of Holland's model was sought by submitting 14 (7 male; 7 female) previously published Strong Interest Inventory (SII) General Occupational Themes (GOT) scale correlation matrices to multiple structural analytic techniques. Randomization tests of hypothesized order relations (Hubert Arabie, 1987) and single sample confirmatory factor analyses (CFA) indicated a moderate to strong correspondence between GOT data and Holland's circular order and circumplex models. Randomization tests of differences in model–data fit, and two-sample CFA indicated that these models are a no more or less accurate representation of the observed data for men than for women. Additional analyses aimed at identifying gender differences in the misfit of specific aspects of Holland's model also yielded no evidence of differential fit.  相似文献   

9.
The misclassification error distance and the adjusted Rand index are two of the most common criteria used to evaluate the performance of clustering algorithms. This paper provides an in-depth comparison of the two criteria, with the aim of better understand exactly what they measure, their properties and their differences. Starting from their population origins, the investigation includes many data analysis examples and the study of particular cases in great detail. An exhaustive simulation study provides insight into the criteria distributions and reveals some previous misconceptions.  相似文献   

10.
Synchronous coordination between two body segments departs from phase locking at 0 or pi radians when the segments are asymmetrical. In models of coordination dynamics, this detuning is typically quantified by Deltaomega = (omega1 - omega2), where omega1 and omega2 are the uncoupled frequencies of the two segments. An experiment is reported in which the magnitude of Deltaomega not equal 0 was satisfied by different ratios Omega of omega1 and omega2. The degree of detuning was found to vary systematically with Omega and Deltaomega. This result corroborates previous research using the complementary manipulation of varying Deltaomega for a fixed Omega. A challenge for future dynamical modeling is identifying precisely how the detuning quantity incorporates both the absolute and relative differences in the. uncoupled segmental frequencies.  相似文献   

11.

For simplicity, most of the literature introduces the concept of definitional equivalence only for disjoint languages. In a recent paper, Barrett and Halvorson introduce a straightforward generalization to non-disjoint languages and they show that their generalization is not equivalent to intertranslatability in general. In this paper, we show that their generalization is not transitive and hence it is not an equivalence relation. Then we introduce another formalization of definitional equivalence due to Andréka and Németi which is equivalent to the Barrett–Halvorson generalization in the case of disjoint languages. We show that the Andréka–Németi generalization is the smallest equivalence relation containing the Barrett–Halvorson generalization and it is equivalent to intertranslatability, which is another definition for definitional equivalence, even for non-disjoint languages. Finally, we investigate which definitions for definitional equivalences remain equivalent when we generalize them for theories in non-disjoint languages.

  相似文献   

12.
A Monte Carlo evaluation of thirty internal criterion measures for cluster analysis was conducted. Artificial data sets were constructed with clusters which exhibited the properties of internal cohesion and external isolation. The data sets were analyzed by four hierarchical clustering methods. The resulting values of the internal criteria were compared with two external criterion indices which determined the degree of recovery of correct cluster structure by the algorithms. The results indicated that a subset of internal criterion measures could be identified which appear to be valid indices of correct cluster recovery. Indices from this subset could form the basis of a permutation test for the existence of cluster structure or a clustering algorithm.  相似文献   

13.
The techniques of multidimensional scaling were used to study the numerical behavior of twelve measures of distance between partitions, as applied to partition lattices of four different sizes. The results offer additional support for a system of classifying partition metrics, as proposed by Boorman (1970), and Boorman and Arabie (1972). While the scaling solutions illuminated differences between the measures, at the same time the particular data with which the measures were concerned offered a basis both for counterexamples to some common assumptions about multidimensional scaling and for some conjectures as to the nature of scaling solutions. The implications of the latter findings for selected examples from the literature are considered. In addition, the methods of partition data analysis discussed here are applied to an example using sociobiological data. Finally, an argument is made against undue emphasis upon interpreting dimensions in nonmetric scaling solutions.  相似文献   

14.
Partitioning indices associated with the within‐cluster sums of pairwise dissimilarities often exhibit a systematic bias towards clusters of a particular size, whereas minimization of the partition diameter (i.e. the maximum dissimilarity element across all pairs of objects within the same cluster) does not typically have this problem. However, when the partition‐diameter criterion is used, there is often a myriad of alternative optimal solutions that can vary significantly with respect to their substantive interpretation. We propose a bicriterion partitioning approach that considers both diameter and within‐cluster sums in the optimization problem and facilitates selection from among the alternative optima. We developed several MATLAB‐based exchange algorithms that rapidly provide excellent solutions to bicriterion partitioning problems. These algorithms were evaluated using synthetic data sets, as well as an empirical dissimilarity matrix.  相似文献   

15.
The category adjustment model (CAM) proposes that estimates of inexactly remembered stimuli are adjusted toward the central value of the category of which the stimuli are members. Adjusting estimates toward the average value of all category instances, properly weighted for memory uncertainty, maximizes the average accuracy of estimates. Thus far, the CAM has been tested only with symmetrical category distributions in which the central stimulus value is also the mean. We report two experiments using asymmetric (skewed) distributions in which there is more than one possible central value: one where the frequency distribution shifts over the course of time, and the other where the frequency distribution is skewed. In both cases, we find that people adjust estimates toward the category’s running mean, which is consistent with the CAM but not with alternative explanations for the adjustment of stimuli toward a category’s central value.  相似文献   

16.
This paper illustrates two formal models for psychiatric classification. The first model, called a hierarchical or tree structure, requires patient categories to be disjoint or strictly nested. The second model, called the generally overlapping or network model, allows patient categories to cut across each other in a variety of different ways. Thus, patient groups can be disjoint, strictly nested (as in a hierarchy), or partially overlapping. To derive classification schemes consistent with the structural models, two different clustering techniques were applied to interpatient similarity data collected on 50 psychiatric patients. A hierarchical clustering technique was applied to the similarity data to obtain a hierarchical classification. To obtain a generally overlapping classification, Peay's cliquing procedure was applied to the same data. Two criteria were used to compare the clustering solutions. First, a solution's goodness-of-fit to the original data was examined by calculating the proportion of variance accounted for by cluster categories. Second, the predictive accuracy of a solution was analyzed by looking at the categories' ability to predict treatment assignment. The generally overlapping solution produced the best fit to the original similarity data; however, the hierarchical solution's clusters tended to be more readily interpretable in terms of psychiatric syndromes. Both clustering solutions were relatively poor predictors of treatment assignment. It was concluded that the hierarchical and generally overlapping approaches, although not conclusively demonstrated, represented promising models for psychiatric classification.  相似文献   

17.
Answer similarity indices were developed to detect pairs of test takers who may have worked together on an exam or instances in which one test taker copied from another. For any pair of test takers, an answer similarity index can be used to estimate the probability that the pair would exhibit the observed response similarity or a greater degree of similarity under the assumption that the test takers worked independently. To identify groups of test takers with unusually similar response patterns, Wollack and Maynes suggested conducting cluster analysis using probabilities obtained from an answer similarity index as measures of distance. However, interpretation of results at the cluster level can be challenging because the method is sensitive to the choice of clustering procedure and only enables probabilistic statements about pairwise relationships. This article addresses these challenges by presenting a statistical test that can be applied to clusters of examinees rather than pairs. The method is illustrated with both simulated and real data.  相似文献   

18.
A highly popular method for examining the stability of a data clustering is to split the data into two parts, cluster the observations in Part A, assign the objects in Part B to their nearest centroid in Part A, and then independently cluster the Part B objects. One then examines how close the two partitions are (say, by the Rand measure). Another proposal is to split the data into k parts, and see how their centroids cluster. By means of synthetic data analyses, we demonstrate that these approaches fail to identify the appropriate number of clusters, particularly as sample size becomes large and the variables exhibit higher correlations.The authors express their thanks to the Sol C. Snider Entrepreneurial Center, Wharton School, for support of this project.  相似文献   

19.
Vermunt JK 《心理学方法》2011,16(1):82-8; discussion 89-92
Steinley and Brusco (2011) presented the results of a huge simulation study aimed at evaluating cluster recovery of mixture model clustering (MMC) both for the situation where the number of clusters is known and is unknown. They derived rather strong conclusions on the basis of this study, especially with regard to the good performance of K-means (KM) compared with MMC. I agree with the authors' conclusion that the performance of KM may be equal to MMC in certain situations, which are primarily the situations investigated by Steinley and Brusco. However, a weakness of the paper is the failure to investigate many important real-world situations where theory suggests that MMC should outperform KM. This article elaborates on the KM-MMC comparison in terms of cluster recovery and provides some additional simulation results that show that KM may be much worse than MMC. Moreover, I show that KM is equivalent to a restricted mixture model estimated by maximizing the classification likelihood and comment on Steinley and Brusco's recommendation regarding the use of mixture models for clustering.  相似文献   

20.
Abstract

Evidence suggests that certain indices of stage of HIV disease are determinants of psychological distress, although information is lacking on how disease stage impacts on multiple domains of adjustment. The present study aimed: (1) to explore differences among clinical stages of HIV on measures of psychosocial adjustment, and (2) to explore the relationship between indices of psychosocial adjustment to HIV and self-report measures of physical health. Ninety six HIV-infected persons and 33 HIV seronegative comparison group participants were interviewed and completed self-administered scales. Participants were divided into four groups (the independent variable): a comparison group and three HIV groups, representing the three clinical indices of illness stage (asymptomatic, early symptomatic and AIDS). Three subjective health indices included number of HIV-related symptoms, global health rating, and T4 count. The dependent variables included 5 psychosocial adjustment measures. Results indicated that social and instrumental domains of adjustment were significantly associated with both clinical stage and all 3 subjective health indices. Levels of psychological distress were associated with number of physical symptoms and global health rating, but were unrelated to clinical stage and T4 count. Emotional and existential concerns were unrelated to all indices of illness stage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号