Randomized Composable Core-sets for Distributed Submodular Maximization
Venue
STOC (2015), pp. 153-162
Publication Year
2015
Authors
Vahab S. Mirrokni, Morteza Zadimoghaddam
BibTeX
Abstract
An effective technique for solving optimization problems over massive data sets is
to partition the data into smaller pieces, solve the problem on each piece and
compute a representative solution from it, and finally obtain a solution inside the
union of the representative solutions for all pieces. This technique can be
captured via the concept of composable core-sets, and has been recently applied to
solve diversity maximization problems as well as several clustering problems
[7,15,8]. However, for coverage and submodular maximization problems, impossibility
bounds are known for this technique [15]. In this paper, we focus on efficient
construction of a randomized variant of composable core-sets where the above idea
is applied on a random clustering of the data. We employ this technique for the
coverage, monotone and non-monotone submodular maximization problems. Our results
significantly improve upon the hardness results for non-randomized core-sets, and
imply improved results for submodular maximization in a distributed and streaming
settings. The effectiveness of this technique has been confirmed empirically for
several machine learning applications [22], and our proof provides a theoretical
foundation to this idea. In summary, we show that a simple greedy algorithm results
in a 1/3-approximate randomized composable core-set for submodular maximization
under a cardinality constraint. Our result also extends to non-monotone submodular
functions, and leads to the first 2-round MapReduce-based constant-factor
approximation algorithm with O(n) total communication complexity for either
monotone or non-monotone functions. Finally, using an improved analysis technique
and a new algorithm PseudoGreedy, we present an improved 0.545-approximation
algorithm for monotone submodular maximization, which is in turn the first
MapReduce-based algorithm beating factor 1/2 in a constant number of rounds.
