Cluster forest
Venue
Computational Statistics and Data Analysis, vol. 66 (2013), pp. 178-192
Publication Year
2013
Authors
Donghui Yan, Aiyou Chen, Michael I Jordan
BibTeX
Abstract
With inspiration from Random Forests (RF) in the context of classification, a new
clustering ensemble method---Cluster Forests (CF) is proposed. Geometrically, CF
randomly probes a high-dimensional data cloud to obtain "good local clusterings"
and then aggregates via spectral clustering to obtain cluster assignments for the
whole dataset. The search for good local clusterings is guided by a cluster quality
measure kappa. CF progressively improves each local clustering in a fashion that
resembles the tree growth in RF. Empirical studies on several real-world datasets
under two different performance metrics show that CF compares favorably to its
competitors. Theoretical analysis reveals that the kappa measure makes it possible
to grow the local clustering in a desirable way---it is "noise-resistant". A
closed-form expression is obtained for the mis-clustering rate of spectral
clustering under a perturbation model, which yields new insights into some aspects
of spectral clustering.
