Achieving Rapid Response Times in Large Online Services
Venue
Talk given at Berkeley AMPLab Cloud Seminar, March 26, 2012 (2012)
Publication Year
2012
Authors
BibTeX
Abstract
Today’s large-scale web services provide rapid responses to interactive requests by
applying large amounts of computational resources to massive datasets. They
typically operate in warehouse-sized datacenters and run on clusters of machines
that are shared across many kinds of interactive and batch jobs. As these systems
distribute work to ever larger numbers of machines and sub-systems in order to
provide interactive response times, it becomes increasingly difficult to tightly
control latency variability across these machines, and often the 95%ile and 99%ile
response times suffer in an effort to improve average response times. As systems
scale up, simply stamping out all sources of variability does not work. Just as
fault-tolerant techniques needed to be developed when guaranteeing fault-free
operation by design became unfeasible, techniques that deliver predictably low
service-level latency in the presence of highly-variable individual components are
increasingly important at larger scales. In this talk, I’ll describe a collection
of techniques and practices lowering response times in large distributed systems
whose components run on shared clusters of machines, where pieces of these systems
are subject to interference by other tasks, and where unpredictable latency hiccups
are the norm, not the exception. Some of the techniques adapt to trends observed
over periods of a few minutes, making them effective at dealing with longer-lived
interference or resource contention. Others react to latency anomalies within a few
milliseconds, making them suitable for mitigating variability within the context of
a single interactive request. I’ll discuss examples of how these techniques are
used in various pieces of Google’s systems infrastructure and in various
higher-level online services.
