Jump to Content
Yuchung Cheng 鄭又中

Yuchung Cheng 鄭又中

Yuchung Cheng is a software engineer at Google. He works on TCP performance for Web and Google services. He obtained a Ph.D. from University of California, San Diego and a B.S from National Taiwan University.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract We describe our experience with Fathom, a system for identifying the network performance bottlenecks of any service running in the Google fleet. Fathom passively samples RPCs, the principal unit of work for services. It segments the overall latency into host and network components with kernel and RPC stack instrumentation. It records these detailed latency metrics, along with detailed transport connection state, for every sampled RPC. This lets us determine if the completion is constrained by the client, network or server. To scale while enabling analysis, we also aggregate samples into distributions that retain multi-dimensional breakdowns. This provides us with a macroscopic view of individual services. Fathom runs globally in our datacenters for all production traffic, where it monitors billions of TCP connections 24x7. For five years Fathom has been our primary tool for troubleshooting service network issues and assessing network infrastructure changes. We present case studies to show how it has helped us improve our production services. View details
    Improving Network Availability with Protective ReRoute
    Abdul Kabbani
    Van Jacobson
    Jim Winget
    Brad Morrey
    Uma Parthavi Moravapalle
    Steven Knight
    SIGCOMM 2023
    Preview abstract We present PRR (Protective ReRoute), a transport technique for shortening user-visible outages that complements routing repair. It can be added to any transport to provide benefits in multipath networks. PRR responds to flow connectivity failure signals, e.g., retransmission timeouts, by changing the FlowLabel on packets of the flow, which causes switches and hosts to choose a different network path that may avoid the outage. To enable it, we shifted our IPv6 network architecture to use the FlowLabel, so that hosts can change the paths of their flows without application involvement. PRR is deployed fleetwide at Google for TCP and Pony Express, where it has been protecting all production traffic for several years. It is also available to our Cloud customers. We find it highly effective for real outages. In a measurement study on our network backbones, adding PRR reduced the cumulative region-pair outage time for RPC traffic by 63--84%. This is the equivalent of adding 0.4--0.8 "nines'" of availability. View details
    PLB: Congestion Signals are Simple and Effective for Network Load Balancing
    Abdul Kabbani
    Junhua Yan
    Kira Yin
    Masoud Moshref
    Mubashir Adnan Qureshi
    Qiaobin Fu
    Van Jacobson
    SIGCOMM (2022)
    Preview abstract We describe our experience with PLB, a host-based load balancing design for modern networks. PLB randomly changes the paths of connections that experience congestion, preferring idle periods to minimize transport interactions. It does so by changing the IPv6 FlowLabel on the packets of a connection, which switches include as part of the ECMP flow hash. Across many hosts, this action drives down the number of hotspots in the network, while separating short RPCs from elephant flows to keep their completion time low. We use PLB fleetwide in Google networks for TCP and PonyExpress (RDMA-like) traffic. We find it to be simple, general, and effective. It was easy to deploy, co-existing with other traffic, requiring only small transport modifications and little of switches, and needing no application changes. And it has produced large gains across the board, for multiple transports and from datacenter through backbone networks. After deploying PLB, the median utilization imbalance of busy switches in Google datacenter networks fell by 60\% and packet drops correspondingly fell by 33\%. At hosts, the tail latency (99$^{th}$ percentile) of short RPCs fell by up to 25\%. View details
    Preview abstract This document presents the RACK-TLP loss detection algorithm for TCP. RACK-TLP uses per-segment transmit timestamps and selective acknowledgments (SACKs) and has two parts. Recent Acknowledgment (RACK) starts fast recovery quickly using time-based inferences derived from acknowledgment (ACK) feedback, and Tail Loss Probe (TLP) leverages RACK and sends a probe packet to trigger ACK feedback to avoid retransmission timeout (RTO) events. Compared to the widely used duplicate acknowledgment (DupAck) threshold approach, RACK-TLP detects losses more efficiently when there are application-limited flights of data, lost retransmissions, or data packet reordering events. It is intended to be an alternative to the DupAck threshold approach. View details
    BBR: Congestion-Based Congestion Control
    C. Stephen Gunn
    Van Jacobson
    Communications of the ACM, vol. 60 (2017), pp. 58-66
    Preview abstract By all accounts, today’s Internet is not moving data as well as it should. Most of the world’s cellular users experience delays of seconds to minutes; public Wi-Fi in airports and conference venues is often worse. Physics and climate researchers need to exchange petabytes of data with global collaborators but find their carefully engineered multi-Gbps infrastructure often delivers at only a few Mbps over intercontinental distances.6 These problems result from a design choice made when TCP congestion control was created in the 1980s—interpreting packet loss as “congestion.”13 This equivalence was true at the time but was because of technology limitations, not first principles. As NICs (network interface controllers) evolved from Mbps to Gbps and memory chips from KB to GB, the relationship between packet loss and congestion became more tenuous. Today TCP’s loss-based congestion control—even with the current best of breed, CUBIC11—is the primary cause of these problems. When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat. When bottleneck buffers are small, loss-based congestion control misinterprets loss as a signal of congestion, leading to low throughput. Fixing these problems requires an alternative to loss-based congestion control. Finding this alternative requires an understanding of where and how network congestion originates. View details
    An Internet-Wide Analysis of Traffic Policing
    Tobias Flach
    Luis Pedrosa
    Tayeb Karim
    Ethan Katz-Bassett
    Ramesh Govindan
    SIGCOMM (2016)
    Preview abstract Large flows like videos consume significant bandwidth. Some ISPs actively manage these high volume flows with techniques like policing, which enforces a flow rate by dropping excess traffic. While the existence of policing is well known, our contribution is an Internet-wide study quantifying its prevalence and impact on video quality metrics. We developed a heuristic to identify policing from server-side traces and built a pipeline to deploy it at scale on hundreds of servers worldwide within one of the largest online content providers. Using a dataset of 270 billion packets served to 28,400 client ASes, we find that, depending on region, up to 7% of lossy transfers are policed. Loss rates are on average 6× higher when a trace is policed, and it impacts video playback quality. We show that alternatives to policing, like pacing and shaping, can achieve traffic management goals while avoiding the deleterious effects of policing. View details
    BBR: Congestion-Based Congestion Control
    C. Stephen Gunn
    Van Jacobson
    ACM Queue, vol. 14, September-October (2016), pp. 20 - 53
    Preview abstract By all accounts, today’s Internet is not moving data as well as it should. Most of the world’s cellular users experience delays of seconds to minutes; public Wi-Fi in airports and conference venues is often worse. Physics and climate researchers need to exchange petabytes of data with global collaborators but find their carefully engineered multi-Gbps infrastructure often delivers at only a few Mbps over intercontinental distances.6 These problems result from a design choice made when TCP congestion control was created in the 1980s—interpreting packet loss as “congestion.”13 This equivalence was true at the time but was because of technology limitations, not first principles. As NICs (network interface controllers) evolved from Mbps to Gbps and memory chips from KB to GB, the relationship between packet loss and congestion became more tenuous. Today TCP’s loss-based congestion control—even with the current best of breed, CUBIC11—is the primary cause of these problems. When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat. When bottleneck buffers are small, loss-based congestion control misinterprets loss as a signal of congestion, leading to low throughput. Fixing these problems requires an alternative to loss-based congestion control. Finding this alternative requires an understanding of where and how network congestion originates. View details
    RFC 7413 - TCP Fast Open
    Jerry Chu
    Sivasankar Radhakrishnan
    Arvind Jain
    Internet Engineering Task Force (IETF) (2014)
    Preview abstract This document describes an experimental TCP mechanism called TCP Fast Open (TFO). TFO allows data to be carried in the SYN and SYN-ACK packets and consumed by the receiving end during the initial connection handshake, and saves up to one full round-trip time (RTT) compared to the standard TCP, which requires a three-way handshake (3WHS) to complete before data can be exchanged. However, TFO deviates from the standard TCP semantics, since the data in the SYN could be replayed to an application in some rare circumstances.Applications should not use TFO unless they can tolerate this issue, as detailed in the Applicability section. View details
    Reducing Web Latency: the Virtue of Gentle Aggression
    Tobias Flach
    Barath Raghavan
    Shuai Hao
    Ethan Katz-Bassett
    Ramesh Govindan
    Proceedings of the ACM Conference of the Special Interest Group on Data Communication (SIGCOMM '13), ACM (2013)
    Preview abstract To serve users quickly, Web service providers build infrastructure closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCP's current mechanisms fundamentally limit latency improvements. We performed a measurement study of a large Web service provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCP's timeout-driven recovery causes transfers with loss to take five times longer on average. In this paper, we present the design of novel loss recovery mechanisms for TCP that judiciously use redundant transmissions to minimize timeout-driven recovery. Proactive, Reactive, and Corrective are three qualitatively different, easily-deployable mechanisms that (1) proactively recover from losses, (2) recover from them as quickly as possible, and (3) reconstruct packets to mask loss. Crucially, the mechanisms are compatible both with middleboxes and with TCP's existing congestion control and loss recovery. Our large-scale experiments on Google's production network that serves billions of flows demonstrate a 23% decrease in the mean and 47% in 99th percentile latency over today's TCP. View details
    RFC 6937 - Proportional Rate Reduction for TCP
    Matt Mathis
    Internet Engineering Task Force (IETF) (2013)
    Preview abstract This document describes an experimental Proportional Rate Reduction (PRR) algorithm as an alternative to the widely deployed Fast Recovery and Rate-Halving algorithms. These algorithms determine the amount of data sent by TCP during loss recovery. PRR minimizes excess window adjustments, and the actual window size at the end of recovery will be as close as possible to the ssthresh, as determined by the congestion control algorithm. View details
    packetdrill: Scriptable Network Stack Testing, from Sockets to Packets
    Lawrence Brakmo
    Matt Mathis
    Barath Raghavan
    Hsiao-keng Jerry Chu
    Tom Herbert
    Proceedings of the USENIX Annual Technical Conference (USENIX ATC 2013), USENIX, 2560 Ninth Street, Suite 215, Berkeley, CA, 94710 USA, pp. 213-218
    Preview abstract Testing today’s increasingly complex network protocol implementations can be a painstaking process. To help meet this challenge, we developed packetdrill, a portable, open-source scripting tool that enables testing the correctness and performance of entire TCP/UDP/IP network stack implementations, from the system call layer to the hardware network interface, for both IPv4 and IPv6. We describe the design and implementation of the tool, and our experiences using it to execute 657 test cases. The tool was instrumental in our development of three new features for Linux TCP—Early Retransmit, Fast Open, and Loss Probes—and allowed us to find and fix 10 bugs in Linux. Our team uses packetdrill in all phases of the development process for the kernel used in one of the world’s largest Linux installations. View details
    RFC6928 - Increasing TCP's Initial Window
    H.K. Jerry Chu
    Matt Mathis
    Internet Engineering Task Force (IETF) (2013)
    Preview abstract This document proposes an experiment to increase the permitted TCP initial window (IW) from between 2 and 4 segments, as specified in RFC 3390, to 10 segments with a fallback to the existing recommendation when performance issues are detected. It discusses the motivation behind the increase, the advantages and disadvantages of the higher initial window, and presents results from several large-scale experiments showing that the higher initial window improves the overall performance of many web services without resulting in a congestion collapse. The document closes with a discussion of usage and deployment for further experimental purposes recommended by the IETF TCP Maintenance and Minor Extensions (TCPM) working group. View details
    Trickle: Rate Limiting YouTube Video Streaming
    Monia Ghobadi
    Matt Mathis
    Proceedings of the USENIX Annual Technical Conference (2012), pp. 6
    Preview abstract YouTube traffic is bursty. These bursts trigger packet losses and stress router queues, causing TCP’s congestion-control algorithm to kick in. In this paper, we introduce Trickle, a server-side mechanism that uses TCP to rate limit YouTube video streaming. Trickle paces the video stream by placing an upper bound on TCP’s congestion window as a function of the streaming rate and the round-trip time. We evaluated Trickle on YouTube production data centers in Europe and India and analyzed its impact on losses, bandwidth, RTT, and video buffer under-run events. The results show that Trickle reduces the average TCP loss rate by up to 43% and the average RTT by up to 28% while maintaining the streaming rate requested by the application. View details
    TCP Fast Open
    Sivasankar Radhakrishnan
    Jerry Chu
    Arvind Jain
    Barath Raghavan
    Proceedings of the 7th International Conference on emerging Networking EXperiments and Technologies (CoNEXT), ACM (2011)
    Preview abstract Today’s web services are dominated by TCP flows so short that they terminate a few round trips after handshaking; this handshake is a significant source of latency for such flows. In this paper we describe the design, implementation, and deployment of the TCP Fast Open protocol, a new mechanism that enables data exchange during TCP’s initial handshake. In doing so, TCP Fast Open decreases application network latency by one full round-trip time, decreasing the delay experienced by such short TCP transfers. We address the security issues inherent in allowing data exchange during the three-way handshake, which we mitigate using a security token that verifies IP address ownership. We detail other fall-back defense mechanisms and address issues we faced with middleboxes, backwards compatibility for existing network stacks, and incremental deployment. Based on traffic analysis and network emulation, we show that TCP Fast Open would decrease HTTP transaction network latency by 15%and whole-page load time over 10% on average, and in some cases up to 40% View details
    Proportional Rate Reduction for TCP
    Matt Mathis
    Monia Ghobadi
    Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement 2011, Berlin, Germany - November 2-4, 2011
    Preview abstract Packet losses increase latency for Web users. Fast recovery is a key mechanism for TCP to recover from packet losses. In this paper, we explore some of the weaknesses of the standard algorithm described in RFC 3517 and the non-standard algorithms implemented in Linux. We find that these algorithms deviate from their intended behavior in the real world due to the combined effect of short flows, application stalls, burst losses, acknowledgment (ACK) loss and reordering, and stretch ACKs. Linux suffers from excessive congestion window reductions while RFC 3517 transmits large bursts under high losses, both of which harm the rest of the flow and increase Web latency. Our primary contribution is a new design to control transmission in fast recovery called proportional rate reduction (PRR). PRR recovers from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs. In addition to PRR, we evaluate the TCP early retransmit (ER) algorithm which lowers the duplicate acknowledgment threshold for short transfers, and show that delaying early retransmissions for a short interval is effective in avoiding spurious retransmissions in the presence of a small degree of reordering. PRR and ER reduce the TCP latency of connections experiencing losses by 3-10% depending on the response size. Based on our instrumentation on Google Web and YouTube servers in U.S. and India, we also present key statistics on the nature of TCP retransmissions. View details
    An Argument for Increasing TCP's Initial Congestion Window
    Jerry Chu
    Tom Herbert
    Amit Agarwal
    Arvind Jain
    Natalia Sutin
    ACM SIGCOMM Computer Communications Review, vol. 40 (2010), pp. 27-33
    Preview abstract TCP flows start with an initial congestion window of at most four segments or approximately 4KB of data. Because most Web transactions are short-lived, the initial congestion window is a critical TCP parameter in determining how quickly flows can finish. While the global network access speeds increased dramatically on average in the past decade, the standard value of TCP’s initial congestion window has remained unchanged. In this paper, we propose to increase TCP’s initial congestion window to at least ten segments (about 15KB). Through large-scale Internet experiments, we quantify the latency benefits and costs of using a larger window, as functions of network bandwidth, round-trip time (RTT), bandwidthdelay product (BDP), and nature of applications. We show that the average latency of HTTP responses improved by approximately 10% with the largest benefits being demonstrated in high RTT and BDP networks. The latency of low bandwidth networks also improved by a significant amount in our experiments. The average retransmission rate increased by a modest 0.5%, with most of the increase coming from applications that effectively circumvent TCP’s slow start algorithm by using multiple concurrent connections. Based on the results from our experiments, we believe the initial congestion window should be at least ten segments and the same be investigated for standardization by the IETF. View details
    Monkey See, Monkey Do: A Tool for TCP Tracing and Replaying
    Stefan Savage
    Geoffrey M. Voelker
    USENIX Annual Technical Conference, General Track (2004)
    Preview
    Accuracy characterization for metropolitan-scale Wi-Fi localization
    Yatin Chawathe
    Anthony LaMarca
    John Krumm
    MobiSys (2005), pp. 233-245