Heracles: Improving Resource Efficiency at Scale
Venue
Proceedings of the 42th Annual International Symposium on Computer Architecture (2015)
Publication Year
2015
Authors
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, Christos Kozyrakis
BibTeX
Abstract
User-facing, latency-sensitive services, such as websearch, underutilize their
computing resources during daily periods of low traffic. Reusing those resources
for other tasks is rarely done in production services since the contention for
shared resources can cause latency spikes that violate the service-level objectives
of latency-sensitive tasks. The resulting under-utilization hurts both the
affordability and energy-efficiency of large-scale datacenters. With technology
scaling slowing down, it becomes important to address this opportunity. We present
Heracles, a feedback-based controller that enables the safe colocation of
best-effort tasks alongside a latency-critical service. Heracles dynamically
manages multiple hardware and software isolation mechanisms, such as CPU, memory,
and network isolation, to ensure that the latency-sensitive job meets latency
targets while maximizing the resources given to best-effort tasks. We evaluate
Heracles using production latency-critical and batch workloads from Google and
demonstrate average server utilizations of 90% without latency violations across
all the load and colocation scenarios that we evaluated.
