# Round Compression for Parallel Matching Algorithms

### Venue

STOC 2018 (to appear)

### Publication Year

2018

### Authors

Aleksander Mądry, Artur Czumaj, Krzysztof Onak, Jakub Łącki, Piotr Sankowski, Slobodan Mitrović

### BibTeX

## Abstract

For over a decade now we have been witnessing the success of massive parallel
computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of
the reasons for their success is the fact that these frameworks are able to
accurately capture the nature of large-scale computation. In particular, compared
to the classic distributed algorithms or PRAM models, these frameworks allow for
much more local computation. The fundamental question that arises in this context
is though: can we leverage this additional power to obtain even faster parallel
algorithms? A prominent example here is the maximum matching problem---one of the
most classic graph problems. It is well known that in the PRAM model one can
compute a 2-approximate maximum matching in O(log n) rounds. However, the exact
complexity of this problem in the MPC framework is still far from understood.
Lattanzi et al. (SPAA 2011) showed that if each machine has n^(1+Omega(1)) memory,
this problem can also be solved 2-approximately in a constant number of rounds.
These techniques, as well as the approaches developed in the follow up work, seem
though to get stuck in a fundamental way at roughly O(log n) rounds once we enter
the (at most) near-linear memory regime. It is thus entirely possible that in this
regime, which captures in particular the case of sparse graph computations, the
best MPC round complexity matches what one can already get in the PRAM model,
without the need to take advantage of the extra local computation power. In this
paper, we finally refute that perplexing possibility. That is, we break the above
O(log n) round complexity bound even in the case of slightly sublinear memory per
machine. In fact, our improvement here is {\em almost exponential}: we are able to
deliver a (2+epsilon)-approximation to maximum matching, for any fixed constant
epsilon>0, in O((log log n)^2) rounds. To establish our result we need to
deviate from the previous work in two important ways that are crucial for
exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we
use vertex--based graph partitioning, instead of the edge--based approaches that
were utilized so far. Secondly, we develop a technique of round compression. This
technique enables one to take a (distributed) algorithm that computes an
O(1)-approximation of maximum matching in O(log n) independent PRAM phases and
implement a super-constant number of these phases in only a constant number of MPC
rounds.