A classifier for the latency-CPU behaviors of serving jobs in distributed environments
Venue
SoCC 15 (2015) (to appear)
Publication Year
2015
Authors
Christophe Restif, Natalia Ponomareva, Krzysztof Ostrowski
BibTeX
Abstract
End-to-end latency of serving jobs in distributed and shared environments, such as
a Cloud, is an important metric for jobs' owners and infrastructure providers. Yet
it is notoriously challenging to model precisely, since it is affected by a large
collection of unrelated moving pieces, from the software design to the job
schedulers strategies. In this work we present a novel approach to modeling
latency, by tracking how it varies with CPU usage. We train a classifier to
automatically assign the latency behavior of methods in three classes: constant
latency regardless of CPU, uncorrelated latency and CPU, and predictable latency as
a function of CPU. We use our model on a random sample of serving jobs running on
the Google infrastructure. We illustrate unexpected and insightful patterns of
latency variations with CPU. The visualization of latency-CPU variations and the
corresponding class may be used by both jobs' owners and infrastructure providers,
for a variety of applications, such as smarter latency alerting, latency-aware
configuration of jobs, and automated detection of changes in behavior, either over
time, during pre-release testing, or across data centers.
