Uncertainty in Aggregate Estimates from Sampled Distributed Traces
Tracing mechanisms in distributed systems give important insight into system properties and are usually sampled to control overhead. At Google, Dapper  is the always-on system for distributed tracing and performance analysis, and it samples fractions of all RPC trafﬁc. Due to difﬁcult implementation, excessive data volume, or a lack of perfect foresight, there are times when system quantities of interest have not been measured directly, and Dapper samples can be aggregated to estimate those quantities in the short or long term. Here we ﬁnd unbiased variance estimates of linear statistics over RPCs, taking into account all layers of sampling that occur in Dapper, and allowing us to quantify the sampling uncertainty in the aggregate estimates. We apply this methodology to the problem of assigning jobs and data to Google datacenters, using estimates of the resulting cross-datacenter trafﬁc as an optimization criterion, and also to the detection of change points in access patterns to certain data partitions.