Being on-call is a critical duty that many operations and engineering teams must
undertake in order to keep their services reliable and available. However, there
are several pitfalls in the organization of on-call rotations and responsibilities
that can lead to serious consequences for the services and for the teams if not
avoided. We provide the primary tenets of the approach to on-call that Google’s
Site Reliability Engineers have developed over years, and explain how that approach
has led to reliable services and sustainable workload over time.