Optimizing Google's Warehouse Scale Computers: The NUMA Experience
Nineteenth International Symposium on High-Performance Computer Architecture, IEEE, IEEE Computer Society, Customer Service Center, 10662 Los Vaqueros Circle, P.O. Box 3014, Los Alamitos, CA 90720-1314 (2013)
This paper argues for a two-phase performance analysis methodology for optimizing WSCs that combines both an in-production investigation and an experimental load-testing approach. To demonstrate the effectiveness of this two-phase methodology, and to illustrate the challenges, methodologies, and opportunities in optimizing modern WSCs, this paper investigates the impact of non-uniform memory access (NUMA) for several Google's key web-service workloads in large-scale production WSCs. Leveraging a newly-designed metric and continuous large-scale profiling in live datacenters, our production analysis demonstrates that NUMA has a significant impact (10-20%) on two important webservices: Gmail backend and search frontend. Our carefully designed load-test further reveals surprising tradeoffs between optimizing for NUMA performance and reducing cache contention.