Improving Resource Efficiency at Scale with Heracles
Venue
ACM Transactions on Computer Systems (TOCS), vol. 34 (2016), 6:1-6:33
Publication Year
2016
Authors
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, Christos Kozyrakis
BibTeX
Abstract
User-facing, latency-sensitive services, such as websearch, underutilize their
computing resources during daily periods of low traffic. Reusing those resources
for other tasks is rarely done in production services since the contention for
shared resources can cause latency spikes that violate the service-level objectives
of latency-sensitive tasks. The resulting under-utilization hurts both the
affordability and energy efficiency of large-scale datacenters. With the slowdown
in technology scaling caused by the sunsetting of Moore’s law, it becomes important
to address this opportunity. We present Heracles, a feedback-based controller that
enables the safe colocation of best-effort tasks alongside a latency-critical
service. Heracles dynamically manages multiple hardware and software isolation
mechanisms, such as CPU, memory, and network isolation, to ensure that the
latency-sensitive job meets latency targets while maximizing the resources given to
best-effort tasks. We evaluate Heracles using production latency-critical and batch
workloads from Google and demonstrate average server utilizations of 90% without
latency violations across all the load and colocation scenarios that we evaluated.
