Characterizing Task Usage Shapes in Google Compute Clusters
Venue
Proceedings of the 5th International Workshop on Large Scale Distributed Systems and Middleware (2011)
Publication Year
2011
Authors
Qi Zhang, Joseph Hellerstein, Raouf Boutaba
BibTeX
Abstract
The increase in scale and complexity of large compute clus- ters motivates a need
for representative workload bench- marks to evaluate the performance impact of
system changes, so as to assist in designing better scheduling algorithms and in
carrying out management activities. To achieve this goal, it is necessary to
construct workload characterizations from which realistic performance benchmarks
can be created. In this paper, we focus on characterizing run-time task resource
usage for CPU, memory and disk. The goal is to find an accurate characterization
that can faithfully reproduce the performance of historical workload traces in
terms of key performance metrics, such as task wait time and machine resource
utilization. Through experiments using workload traces from Google production
clusters, we find that simply using the mean of task usage can generate synthetic
work- load traces that accurately reproduce resource utilizations and task waiting
time. This seemingly surprising result can be justified by the fact that resource
usage for CPU, mem- ory and disk are relatively stable over time for the majority
of the tasks. Our work not only presents a simple tech- nique for constructing
realistic workload benchmarks, but also provides insights into understanding
workload perfor- mance in production compute clusters.
