首页 | 本学科首页   官方微博 | 高级检索  
     


A cautionary note on using internal cross validation to select the number of clusters
Authors:Abba M. Krieger  Paul E. Green
Affiliation:(1) Department of Statistics, University of Pennsylvania, USA;(2) Marketing Department, The Wharton School, University of Pennsylvania, 1400 Steinberg Hall-Dietrich Hall, 19104-6371 Philadelphia, PA
Abstract:A highly popular method for examining the stability of a data clustering is to split the data into two parts, cluster the observations in Part A, assign the objects in Part B to their nearest centroid in Part A, and then independently cluster the Part B objects. One then examines how close the two partitions are (say, by the Rand measure). Another proposal is to split the data into k parts, and see how their centroids cluster. By means of synthetic data analyses, we demonstrate that these approaches fail to identify the appropriate number of clusters, particularly as sample size becomes large and the variables exhibit higher correlations.The authors express their thanks to the Sol C. Snider Entrepreneurial Center, Wharton School, for support of this project.
Keywords:cluster analysis  cross-validation  stopping rules
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号