Distributed Balanced Clustering via Mapping Coresets
Venue
NIPS, Neural Information Processing Systems Foundation (2014)
Publication Year
2014
Authors
Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni
BibTeX
Abstract
Large-scale clustering of data points in metric spaces is an important problem in
mining big data sets. For many applications, we face explicit or implicit size
constraints for each cluster which leads to the problem of clustering under
capacity constraints or the balanced clustering'' problem. Although the balanced
clustering problem has been widely studied, developing a theoretically sound
distributed algorithm remains an open problem. In the present paper we develop a
general framework based on mapping coresets'' to tackle this issue. For a wide
range of clustering objective functions such as k-center, k-median, and k-means,
our techniques give distributed algorithms for balanced clustering that match the
best known single machine approximation ratios.
