Robust and Probabilistic Failure-Aware Placements
Venue
ACM Symposium on Parallel Algorithms and Architectures (SPAA), California, USA (2016)
Publication Year
2016
Authors
Madhukar Korupolu, Rajmohan Rajaraman
BibTeX
Abstract
Motivated by the growing complexity of heterogeneity and hierarchical organization
in data centers, in this paper we address the problem of placing copies of tasks on
machines with the goal of increasing availability in the presence of failures. We
consider two models of failures: adversarial and probabilistic. In the adversarial
model, each node has a weight (higher weight implying higher reliability) and the
adversary can remove any subset of nodes of total weight at most a given bound W
and our goal is to find a placement that incurs the least disruption against such
an adversary. In the probabilistic model, each node has a probability of failure
and we want to find a placement that maximizes the probability that at least K out
of N tasks are available at any time. The problems in the first category covering
adversarial failures lie in $\Sigma_2$, the second level of the polynomial
hierarchy. We show that a basic variant, which we call RobustFAP, is co-NP-hard,
and an all-or-nothing version of RobustFAP is $\Sigma_2$-complete. Our first
algorithmic result here is is a PTAS for RobustFAP, a key ingredient of which is a
solution to a fractional variant of RobustFAP. Our second algorithmic result is for
fractional HierRobustFAP over hierarchies: a new Generalized Spreading algorithm,
which is simultaneously optimal for all W. This algorithm generalizes the classical
notion of {\em max-min fairness} to work with nodes of differing capacities,
differing reliability weights and hierarchical structures. We then apply randomized
rounding to obtain an algorithm for integral RobustFAP over hierarchies. For the
probabilistic version ProbFAP, we first focus on the single level problem and give
an algorithm that achieves an additive \eps approximation in the failure
probability while giving up a (1 + \eps) multiplicative factor in the number of
failures. We then extend the result to the hierarchical version achieving a \eps
additive approximation in failure probability while giving up a (L + \eps)
multiplicative factor in the number of failures, where L is the number of levels in
the hierarchy.
