A cautionary note on using internal cross validation to select the number of clusters期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A cautionary note on using internal cross validation to select the number of clusters

Authors:	Abba M. Krieger Paul E. Green

Affiliation:	(1) Department of Statistics, University of Pennsylvania, USA;(2) Marketing Department, The Wharton School, University of Pennsylvania, 1400 Steinberg Hall-Dietrich Hall, 19104-6371 Philadelphia, PA

Abstract:	A highly popular method for examining the stability of a data clustering is to split the data into two parts, cluster the observations in Part A, assign the objects in Part B to their nearest centroid in Part A, and then independently cluster the Part B objects. One then examines how close the two partitions are (say, by the Rand measure). Another proposal is to split the data into k parts, and see how their centroids cluster. By means of synthetic data analyses, we demonstrate that these approaches fail to identify the appropriate number of clusters, particularly as sample size becomes large and the variables exhibit higher correlations.The authors express their thanks to the Sol C. Snider Entrepreneurial Center, Wharton School, for support of this project.

Keywords:	cluster analysis cross-validation stopping rules
本文献已被 SpringerLink 等数据库收录！