Perhaps the most common criterion for partitioning a data set is the minimization of the within-cluster sums of squared deviation from cluster centroids. Although optimal solution procedures for within-cluster sums of squares (WCSS) partitioning are computationally feasible for small data sets, heuristic procedures are required for most practical applications in the behavioral sciences. We compared the performances of nine prominent heuristic procedures for WCSS partitioning across 324 simulated data sets representative of a broad spectrum of test conditions. Performance comparisons focused on both percentage deviation from the “best-found” WCSS values, as well as recovery of true cluster structure. A real-coded genetic algorithm and variable neighborhood search heuristic were the most effective methods; however, a straightforward two-stage heuristic algorithm, HK-means, also yielded exceptional performance. A follow-up experiment using 13 empirical data sets from the clustering literature generally supported the results of the experiment using simulated data. Our findings have important implications for behavioral science researchers, whose theoretical conclusions could be adversely affected by poor algorithmic performances.  相似文献   

Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.  相似文献   

Given that a minor condition holds (e.g., the number of variables is greater than the number of clusters), a nontrivial lower bound for the sum-of-squares error criterion in K-means clustering is derived. By calculating the lower bound for several different situations, a method is developed to determine the adequacy of cluster solution based on the observed sum-of-squares error as compared to the minimum sum-of-squares error. The author was partially supported by the Office of Naval Research Grant #N00014-06-0106.  相似文献   

The seriation of proximity matrices is an important problem in combinatorial data analysis and can be conducted using a variety of objective criteria. Some of the most popular criteria for evaluating an ordering of objects are based on (anti-) Robinson forms, which reflect the pattern of elements within each row and/or column of the reordered matrix when moving away from the main diagonal. This paper presents a branch-and-bound algorithm that can be used to seriate a symmetric dissimilarity matrix by identifying a reordering of rows and columns of the matrix optimizing an anti-Robinson criterion. Computational results are provided for several proximity matrices from the literature using four different anti-Robinson criteria. The results suggest that with respect to computational efficiency, the branch-and-bound algorithm is generally competitive with dynamic programming. Further, because it requires much less storage than dynamic programming, the branch-and-bound algorithm can provide guaranteed optimal solutions for matrices that are too large for dynamic programming implementations.  相似文献   

This paper proposes an order-constrained K-means cluster analysis strategy, and implements that strategy through an auxiliary quadratic assignment optimization heuristic that identifies an initial object order. A subsequent dynamic programming recursion is applied to optimally subdivide the object set subject to the order constraint. We show that although the usual K-means sum-of-squared-error criterion is not guaranteed to be minimal, a true underlying cluster structure may be more accurately recovered. Also, substantive interpretability seems generally improved when constrained solutions are considered. We illustrate the procedure with several data sets from the literature.  相似文献   

Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, theK-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.Acknowledgements: Sara Dickson, Vidya Nair, and Beth Means assisted with the neural network analyses.  相似文献   

The maximum cardinality subset selection problem requires finding the largest possible subset from a set of objects, such that one or more conditions are satisfied. An important extension of this problem is to extract multiple subsets, where the addition of one more object to a larger subset would always be preferred to increases in the size of one or more smaller subsets. We refer to this as the multiple subset maximum cardinality selection problem (MSMCSP). A recently published branch‐and‐bound algorithm solves the MSMCSP as a partitioning problem. Unfortunately, the computational requirement associated with the algorithm is often enormous, thus rendering the method infeasible from a practical standpoint. In this paper, we present an alternative approach that successively solves a series of binary integer linear programs to obtain a globally optimal solution to the MSMCSP. Computational comparisons of the methods using published similarity data for 45 food items reveal that the proposed sequential method is computationally far more efficient than the branch‐and‐bound approach.  相似文献   

Cluster differences scaling is a method for partitioning a set of objects into classes and simultaneously finding a low-dimensional spatial representation ofK cluster points, to model a given square table of dissimilarities amongn stimuli or objects. The least squares loss function of cluster differences scaling, originally defined only on the residuals of pairs of objects that are allocated to different clusters, is extended with a loss component for pairs that are allocated to the same cluster. It is shown that this extension makes the method equivalent to multidimensional scaling with cluster constraints on the coordinates. A decomposition of the sum of squared dissimilarities into contributions from several sources of variation is described, including the appropriate degrees of freedom for each source. After developing a convergent algorithm for fitting the cluster differences model, it is argued that the individual objects and the cluster locations can be jointly displayed in a configuration obtained as a by-product of the optimization. Finally, the paper introduces a fuzzy version of the loss function, which can be used in a successive approximation strategy for avoiding local minima. A simulation study demonstrates that this strategy significantly outperforms two other well-known initialization strategies, and that it has a success rate of 92 out of 100 in attaining the global minimum.  相似文献   

The discovery of Hengxian and the formation of the category of hengxian are an important recapitulation and creative integration of the theory of the ontological Dao (Tao) in the Pre-Qin period. The cosmology of “self-creating and self-functioning” in Hengxian and the theory of “self-creating and self-evolving” in Liezi and Zhuangzi can be mutually interpreted. It indicates that the theory of transformation of qi entered a quite mature state in the Warring States Period. __________ Translated from Qilu Xuekan 齐鲁学刊 (Qilu Academic Journal), 2005 (1) by Yan Xin  相似文献   

Factor analysis is arguably one of the most important tools in the science of mental abilities. While many studies have been conducted to make recommendations regarding “best practices” concerning its use, it is unknown the degree to which contemporary ability researchers abide by those standards. The current study sought to evaluate the typical practices of contemporary ability researchers. We analyzed articles reporting factor analyses of cognitive ability tests administered to adult samples over a 12 year period. Results suggest that, in aggregate, the science of mental abilities seems to be doing well with respect to the issues of sample size, number of indicators (relative to number of factors) and breadth of indicators. Further, our results suggest that the majority of ability researchers are using methods of factor analysis that allow for the identification of a g factor. However, 14.57% failed to use a method that allowed a common factor to emerge. These results provide insights regarding the methodological quality of the science of mental abilities, and will hopefully encourage further “introspective” research into the science of mental abilities.  相似文献   

Described an intervention program designed to prepare elementary school (K-8) eighth-grade students for their transition to high school the following year. Participants in the study were 145, predominantly Hispanic, inner-city public school adolescents. The experimental group received an augmented condition, consisting of Education and Peer Support Components. The control group received a minimal condition consisting of only the Education Component. While no group effects were observed, time effects indicated experimental and control students' improved perceptions of school readiness, but deteriorated perceptions of support from both home and school and diminished grade-point averages and attendance. Time effects also revealed variable changes in school perceptions. Findings are discussed in terms of a developmental perspective of the school transition process. Implications for high school transition programming with the target population and directions for future research are also addressed.  相似文献   

The topic of insurance coverage and justification letters for cancer predisposition testing has been the subject of much discussion on the National Society of Genetic Counselors Cancer Special Interest Group (NSGC Cancer-SIG) listserv. Some counselors have stated that they have had difficulty in obtaining insurance coverage for their patients, while others have indicated that they would appreciate seeing examples of successful letters. The purpose of this paper is to provide practical guidance in writing successful letters of justification and to share insurance success stories in the area of cancer genetic testing.  相似文献   

The examination of names and words constitutes an important aspect of the philosophy of Zhuangzi. With the debate over the relationship between name and reality as its background, this examination not only involves the connection between form and meaning, but also targets at the connection between concepts and objects. The debate over the relationship between name and reality correlates with the discussion of the connection between words and meanings or ideas. For Zhuangzi, the function of names and words is first and foremost embodied as the classification and distinction of being, while the Dao, as the universal principle of being, is characterized by equality and throughness. This leads to an inherent disparity and tension between names, words and the Dao. Zhuangzi℉s thinking and argument concern the connections between name and reality, words and ideas, and the Dao and words. This displays multiple theoretical perspectives and the complexity of its thought. Translated by Xiao Mo from Zhongguo Shehui Kexue 中国社会科学(Social Sciences in China), 2006, (4): 38–49  相似文献   

An extension of multiple correspondence analysis is proposed that takes into account cluster-level heterogeneity in respondents’ preferences/choices. The method involves combining multiple correspondence analysis and k-means in a unified framework. The former is used for uncovering a low-dimensional space of multivariate categorical variables while the latter is used for identifying relatively homogeneous clusters of respondents. The proposed method offers an integrated graphical display that provides information on cluster-based structures inherent in multivariate categorical data as well as the interdependencies among the data. An empirical application is presented which demonstrates the usefulness of the proposed method and how it compares to several extant approaches. The work reported in this paper was supported by Grant 290439 and Grant A6394 from the Natural Sciences and Engineering Research Council of Canada to the first and third authors, respectively. We wish to thank Ulf B?ckenholt, Paul Green, and Marc Tomiuk for their insightful comments on an earlier version of this paper. We also wish to thank Byunghwa Yang for generously providing us with his data.  相似文献   

The reason for the emergence of consciousness of filial piety is that parental care could activate reciprocal filial piety. Parental care and filial piety are two supplementary phenomena caused by the same time consciousness. Phenomenology neglects consciousness of filial piety because it lacks the thinking that sees the fundamental “meaning of time” in the intersection of “past” and “future”. The consciousness of filial piety can only be really constituted by a human being’s personal experience. “Frustrations in personal life” and “breeding of children for oneself” are two occasions for an adult to fight against the separating effect of individualized consciousness and regain awareness of filial piety. Translated by Huang Deyuan from Beijing Daxue Xuebao 北京大学学报 (Journal of Peking University), 2006, (1): 14–24  相似文献   

The importance of appropriate test selection for a given research endeavor cannot be over-emphasized. Using samples drawn from eleven populations (differing in shape, peakedness, and density in the tails), this study investigates the small sample empirical powers of ninek-sample tests against ordered location alternatives under completely randomized designs. The results then are intended to aid the researcher in the selection of a particular procedure appropriate for a given endeavor. To highlight this an industrial psychology application involving work productivity is presented.Research was supported in part by the Scholastic Assistance Program, Baruch College. The author wishes to thank Professors Matthew Goldstein, Shulamith Gross, David Levine, and Edward Wolf for their helpful comments when writing this paper. In addition, the author wishes to thank the referees and editor for their useful suggestions for improving the paper.  相似文献   

The meaning and properties of a commonly used index of reliability, S/L,were examined critically. It was found that the index does not reflect any conventional concept of reliability. When used for an identical behavioral observation session, it is not statistically correlated with other reliability indices. Within an observation session, the standardizing measure of Lis beyond the control of the investigator. Furthermore, the reason for the choice of Las the standard is unclear. The role of chance agreement in S/Lis not known. The exact interpretation of the index depends on which observer reports L.Overall the conceptual and mathematical meaning of S/Lis dubious. It is suggested that the S/Lindex should not be used until its nature is shown to be a measure of reliability. Other approaches such as the intraclass correlations and generalizability coefficients should be used instead.The authors are indebted to Johnny Matson for his critique of an earlier version of this paper.  相似文献   

Recent research on the DSM-IV subtypes of attention-deficit/hyperactivity disorder (ADHD) has demonstrated that the subtypes differ in demographic characteristics, types of functional impairment, and profiles of comorbidity with other childhood disorders. However, little research has tested whether the subtypes differ in underlying neuropsychological deficits. This study compared the neuropsychological profiles of children without ADHD (n = 82) and children who met symptom criteria for DSM-IV Predominantly Inattentive subtype (ADHD-IA; n = 67), Predominantly Hyperactive Impulsive subtype (ADHD-HI; n = 14), and Combined subtype (ADHD-C; n = 33) in the areas of processing speed, vigilance, and inhibition. We hypothesized that children with elevations of inattention symptoms (ADHD-IA and ADHD-C) would be impaired on measures of vigilance and processing speed, whereas children with significant hyperactivity/impulsivity (ADHD-HI and ADHD-C) would be impaired on measures of inhibition. Contrary to prediction, symptoms of inattention best predicted performance on all dependent measures, and ADHD-IA and ADHD-C children had similar profiles of impairment. In contrast, children with ADHD-HI were not significantly impaired on any dependent measures once subclinical symptoms of inattention were controlled. Our results do not support distinct neuropsychological deficits in ADHD-IA and ADHD-C children, and suggest that symptoms of inattention, rather than symptoms of hyperactivity/impulsivity, are associated with neuropsychological impairment.  相似文献   

In this paper we consider the relations existing between four deductive systems that have been called categorial grammars and have relevant connections with linguistic investigations: the syntactic calculus, bilinear logic, compact bilinear logic and Curry's semantic calculus.  相似文献   

