Parallel Spectral Clustering
Venue
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), Springer (2008), pp. 374-389
Publication Year
2008
Authors
Yangqiu Song, Wen-Yen Chen, Hongjie Bai, Chih-Jen Lin, Edward Chang
BibTeX
Abstract
Spectral clustering algorithm has been shown to be more eective in nding clusters
than most traditional algorithms. However, spectral clustering suers from a
scalability problem in both memory use and computational time when a dataset size
is large. To perform clustering on large datasets, we propose to parallelize both
memory use and computation on distributed computers. Through an empirical study on
a large document dataset of 193,844 data instances and a large photo dataset of
637,137, we demonstrate that our parallel algorithm can effectively alleviate the
scalability problem.
