Max-Cover in Map-Reduce
Proceedings of the 19th international conference on World Wide Web, ACM, Raleigh, North Carolina (2010), pp. 231-240
Flavio Chierichetti, Ravi Kumar, Andrew Tomkins
We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm.