Robert Hundt

Robert Hundt

Robert Hundt received a degree in Computer Science from Technical University in Munich in 1992. Until 1999 he worked for Terrasat GmbH in Germany, a 20+ people R&D company he co-owned. He played many roles - from company lead to booth cat - while writing and optimizing software for surveying and navigation with satellite systems.

In 2000 he started working for Hewlett-Packard Company in California on bringing up the new and scalable high-level optimizer SYZYGY for the HP C/C++/FORTRAN compilers with a new inter-procedural optimizer, a new loop optimizer, and a new scalar optimizer. Before joining the compiler group, Robert was responsible for dynamic binary instrumentation for Intel Itanium processors, co-creating and designing the performance analysis tool HP Caliper.

Since beginning of 2007 Robert has been working for Google. He created various compiler and performance projects, e.g., he served as Tech Lead for compiler optimization for servers (x86), Android (ARM), and GPUs (open-source CUDA compiler), built datacenter profiling and performance analysis tools, and worked on GMail/Apps performance, from Chrome to datacenter. For many years Robert was the SW lead for Google TPU - supercomputers to accelerate machine learning inference and training, which include the open-source TensorFlow compiler XLA. Today he is the TL for ML compilers, runtimes, and performance, for TPU, GPU, and CPU. In parallel, he works on the open-source High-Level Synthesis toolchain XLS and dabbles in Quantum Computing. He remains strongly engaged in compiler and datacenter research.

In real life, he enjoys spending time with his family, playing the piano (at which he sucks), playing Volleyball (which he used to do fairly well) and everything related to delicious high quality food (his main reason for joining Google ;-)

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Google
Quantum Computing for Programmers
Cambridge University Press, Cambridge CB2 8BS, United Kingdom (2022)
In-Datacenter Performance Analysis of a Tensor Processing Unit
Mark Omernick
Diemthu Le
Robert Hagmann
Kathy Nix
Clifford Chao
Jeremy Coriell
Pierre-luc Cantin
Andy Koch
Rahul Nagarajan
Mike Daley
Al Borchers
Chris Clark
Adriana Maggiore
Raminder Bajwa
Matt Dau
Ben Gelb
Alan Lundin
Ray Ni
Rick Boyle
Steve Lacy
Alek Jaworski
John Hu
Thomas Norrie
Aaron Jaffey
Rajendra Gottipati
James Law
Ravi Narayanaswami
Jonathan Ross
Harshit Khaitan
Kyle Lucke
C. Richard Ho
Alexander Kaplan
Andy Phelps
Narayana Penukonda
Nan Boden
Sarah Bates
Maire Mahony
William Gulland
Doug Hogberg
Gordon MacKean
Zhuyuan Liu
Tara Vazir Ghaemmaghami
Dan Hurt
Kieran Miller
Suresh Bhatia
Gaurav Agrawal
Julian Ibarz
Nishant Patil
Norman P. Jouppi
Naveen Kumar
Chris Leary
ISCA (2017) (to appear)
GPUCC - An Open-Source GPGPU Compiler
Xuetian Weng
Jingyue Wu
Rob Springer
Bjarke Roune
Mark Heffernan
Chris Leary
Proceedings of the 2016 International Symposium on Code Generation and Optimization, ACM, New York, NY, pp. 105-116
Whare-Map: Heterogeneity in “Homogeneous” Warehouse-Scale Computers
Lingjia Tang
Jason Mars
Proceedings of the 2013 ACM/IEEE International Symposium on Computer Architecture (ISCA), IEEE (to appear)
Optimizing Google's Warehouse Scale Computers: The NUMA Experience
Lingjia Tang
Robert Hagmann
Jason Mars
The 19th IEEE International Symposium on High Performance Computer Architecture (2013)
Preview
JSWhiz - Static Analysis for JavaScript Memory Leaks
Proceedings of the 10th annual IEEE/ACM international symposium on Code generation and optimization, IEEE (2013)
MAO - an Extensible Micro-Architectural Optimizer
Martin Thuresson
Neil Vachharajani
Easwaran Raman
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, ACM (2011)
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications
Mary-Lou Soffa
Jason Mars
Neil Vachharajani
Lingjia Tang
ISCA, ACM (2011)
RACEZ: A Lightweight and Non-Invasive Race Detection Tool for Production Applications
Neil Vachharajani
Tianwei Sheng
Stephane Eranian
ICSE, ACM (2011), pp. 401-410