Filtering: a method for solving graph problems in MapReduce.
Venue
SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 85-94
Publication Year
2011
Authors
Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, Sergei Vassilvitskii
BibTeX
Abstract
The MapReduce framework is currently the de facto standard used throughout both
industry and academia for petabyte scale data analysis. As the input to a typical
MapReduce computation is large, one of the key requirements of the framework is
that the input cannot be stored on a single machine and must be processed in
parallel. In this paper we describe a general algorithmic design technique in the
MapReduce framework called filtering. The main idea behind filtering is to reduce
the size of the input in a distributed fashion so that the resulting, much smaller,
problem instance can be solved on a single machine. Using this approach we give new
algorithms in the MapReduce framework for a variety of fundamental graph problems.
Specifically, we present algorithms for minimum spanning trees, maximal matchings,
approximate weighted matchings, approximate vertex and edge covers and minimum
cuts. In all of these cases, we will parameterize our algorithms by the amount of
memory available on the machines allowing us to show tradeoffs between the memory
available and the number of MapReduce rounds. For each setting we will show that
even if the machines are only given substantially sublinear memory, our algorithms
run in a constant number of MapReduce rounds. To demonstrate the practical
viability of our algorithms we implement the maximal matching algorithm that lies
at the core of our analysis and show that it achieves a significant speedup over
the sequential version.
