Diagnosing performance changes by comparing request flows
Venue
8th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2011)
Publication Year
2011
Authors
Raja R. Sambasivan, Alice X. Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, Gregory R. Ganger
BibTeX
Abstract
The causes of performance changes in a distributed system often elude even its
developers. This paper develops a new technique for gaining insight into such
changes: comparing system behaviours from two executions (e.g., of two system
versions or time periods). Building on end-to-end request flow tracing within and
across components, algorithms are described for identifying and ranking changes in
the flow and/or timing of request processing. The implementation of these
algorithms in a tool called Spectroscope is described and evaluated. Six case
studies are presented of using Spectroscope to diagnose performance changes in a
distributed storage system caused by code changes, configuration modifications, and
component degradations, demonstrating the value and efficacy of comparing request
flows. Preliminary experiences of using Spectroscope to diagnose performance
changes within Google are also presented.
