Uncertainty in Aggregate Estimates from Sampled Distributed Traces
Venue
2012 Workshop on Managing Systems Automatically and Dynamically, USENIX
Publication Year
2012
Authors
Nate Coehlo, Arif Merchant, Murray Stokely
BibTeX
Abstract
Tracing mechanisms in distributed systems give important insight into system
properties and are usually sampled to control overhead. At Google, Dapper [8] is
the always-on system for distributed tracing and performance analysis, and it
samples fractions of all RPC traffic. Due to difficult implementation, excessive data
volume, or a lack of perfect foresight, there are times when system quantities of
interest have not been measured directly, and Dapper samples can be aggregated to
estimate those quantities in the short or long term. Here we find unbiased variance
estimates of linear statistics over RPCs, taking into account all layers of
sampling that occur in Dapper, and allowing us to quantify the sampling uncertainty
in the aggregate estimates. We apply this methodology to the problem of assigning
jobs and data to Google datacenters, using estimates of the resulting
cross-datacenter traffic as an optimization criterion, and also to the detection of
change points in access patterns to certain data partitions.