Characterizing Task Usage Shapes in Google Compute Clusters
Abstract
The increase in scale and complexity of large compute clus-
ters motivates a need for representative workload bench-
marks to evaluate the performance impact of system changes,
so as to assist in designing better scheduling algorithms and
in carrying out management activities. To achieve this goal,
it is necessary to construct workload characterizations from
which realistic performance benchmarks can be created. In
this paper, we focus on characterizing run-time task resource
usage for CPU, memory and disk. The goal is to find an
accurate characterization that can faithfully reproduce the
performance of historical workload traces in terms of key
performance metrics, such as task wait time and machine
resource utilization. Through experiments using workload
traces from Google production clusters, we find that simply
using the mean of task usage can generate synthetic work-
load traces that accurately reproduce resource utilizations
and task waiting time. This seemingly surprising result can
be justified by the fact that resource usage for CPU, mem-
ory and disk are relatively stable over time for the majority
of the tasks. Our work not only presents a simple tech-
nique for constructing realistic workload benchmarks, but
also provides insights into understanding workload perfor-
mance in production compute clusters.