LatLong: Diagnosing Wide-Area Latency Changes for CDNs
Venue
IEEE Transactions on Network and Service Management, vol. 9 (2012) (to appear)
Publication Year
2012
Authors
Yaping Zhu, Benjamin Helsley, Jennifer Rexford, Aspi Siganporia, Sridhar Srinivasan
BibTeX
Abstract
Minimizing user-perceived latency is crucial for Content Distribution Networks
(CDNs) hosting interactive services. Latency may increase for many reasons, such as
interdomain routing changes and the CDN's own load-balancing policies. CDNs need
greater visibility into the causes of latency increases, so they can adapt by
directing traffic to different servers or paths. In this paper, we propose
techniques for CDNs to diagnose large latency increases, based on passive
measurements of performance, traffic, and routing. Separating the many causes from
the effects is challenging. We propose a decision tree for classifying latency
changes, and determine how to distinguish traffic shifts from increases in latency
for existing servers, routers, and paths. Another challenge is that network
operators group related clients to reduce measurement and control overhead, but the
clients in a region may use multiple servers and paths during a measurement
interval. We propose metrics that quantify the latency contributions across sets of
servers and routers. Analyzing a month of data from Google's CDN, we find that
nearly 1% of the daily latency changes increase delay by more than 100 msec. More
than 40% of these increases coincide with interdomain routing changes, and more
than one-third involve a shift in traffic to different servers. This is the first
work to diagnose latency problems in a large, operational CDN from purely passive
measurements. Through case studies of individual events, we identify research
challenges for measuring and managing wide-area latency for CDNs.
