Publication Data
Parallel Spectral Clustering
Abstract: Spectral clustering algorithm has been shown to be more
eective in nding clusters than most traditional algorithms. However, spectral
clustering suers from a scalability problem in both memory use and computational time
when a dataset size is large. To perform clustering on large datasets, we propose to
parallelize both memory use and computation on distributed computers. Through an
empirical study on a large document dataset of 193,844 data instances and a large photo
dataset of 637,137, we demonstrate that our parallel algorithm can effectively
alleviate the scalability problem.
