Publication Data
Diagnosing performance changes by comparing request flows
Abstract: The causes of performance changes in a distributed system
often elude even its developers. This paper develops a new technique for gaining
insight into such changes: comparing system behaviours from two executions (e.g., of
two system versions or time periods). Building on end-to-end request flow tracing
within and across components, algorithms are described for identifying and ranking
changes in the flow and/or timing of request processing. The implementation of these
algorithms in a tool called Spectroscope is described and evaluated. Six case studies
are presented of using Spectroscope to diagnose performance changes in a distributed
storage system caused by code changes, configuration modifications, and component
degradations, demonstrating the value and efficacy of comparing request flows.
Preliminary experiences of using Spectroscope to diagnose performance changes within
Google are also presented.
