The general trend in software over the last several years is to give every system
an API and turn every product into a platform. When these systems only served end
users, their reliability depended solely on how well we did our jobs as SREs.
Increasingly, however, our customers' perceptions of our reliability are being
driven by the quality of the software they bring to our platforms. The normal
boundaries between our platforms and our customers are being blurred and it's
getting harder to deliver a consistent end user reliability experience. In this
talk we'll discuss a provocative idea—that as SREs we should take joint operational
responsibility and go on-call for the systems our customers build on our platforms.
We'll discuss the specific technical and operational challenges in this approach
and the results of an experiment we're running at Google to address this need.
Finally, we'll try to take a glimpse into the future and see what these changes
mean for the future of SRE as a discipline.