Profiling a warehouse-scale computer
Venue
ISCA '15 Proceedings of the 42nd Annual International Symposium on Computer Architecture, ACM (2014), pp. 158-169
Publication Year
2014
Authors
Svilen Kanev, Juan Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, David Brooks
BibTeX
Abstract
With the increasing prevalence of warehouse-scale (WSC) and cloud computing,
understanding the interactions of server applications with the underlying
microarchitecture becomes ever more important in order to extract maximum
performance out of server hardware. To aid such understanding, this paper presents
a detailed microarchitectural analysis of live datacenter jobs, measured on more
than 20,000 Google machines over a three year period, and comprising thousands of
different applications. We first find that WSC workloads are extremely diverse,
breeding the need for architectures that can tolerate application variability
without performance loss. However, some patterns emerge, offering opportunities for
co-optimization of hardware and software. For example, we identify common building
blocks in the lower levels of the software stack. This "datacenter tax" can
comprise nearly 30% of cycles across jobs running in the fleet, which makes its
constituents prime candidates for hardware specialization in future server
systems-on-chips. We also uncover opportunities for classic microarchitectural
optimizations for server processors, especially in the cache hierarchy. Typical
workloads place significant stress on instruction caches and prefer memory latency
over bandwidth. They also stall cores often, but compute heavily in bursts. These
observations motivate several interesting directions for future warehouse-scale
computers.
