Jeffrey C. Mogul
Jeff Mogul works on fast, cheap, reliable, and flexible networking infrastructure for Google. Until 2013, he was Fellow at HP Labs, doing research primarily on computer networks and operating systems issues for enterprise and cloud computer systems; previously, he worked at the DEC/Compaq Western Research Lab. He received his PhD from Stanford in 1986, an MS from Stanford in 1980, and an SB from MIT in 1979. He is an ACM Fellow. Jeff is the author or co-author of several Internet Standards; he contributed extensively to the HTTP/1.1 specification. He was an associate editor of Internetworking: Research and Experience, and has been the chair or co-chair of a variety of conferences and workshops, including SIGCOMM, OSDI, NSDI, USENIX, HotOS, and ANCS.
You can find a mostly up-to-date CV at http://jmogul.com/mogulcv.pdf
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
While many network research papers address issues of deployability, with
a few exceptions, this has been limited to protocol compatibility or
switch-resource constraints, such as flow table sizes.
We argue that good network designs must also consider the costs and
complexities of deploying the design within the constraints of the physical
environment in a datacenter: \emph{physical} deployability.
The traditional metrics of network ``goodness'' mostly do not account
for these costs and constraints, and this may partially explain why some
otherwise attractive designs have not been deployed in real-world datacenters.
View details
Change Management in Physical Network Lifecycle Automation
Virginia Beauregard
Kevin Grant
Angus Griffith
Jahangir Hasan
Chen Huang
Quan Leng
Jiayao Li
Alexander Lin
Zhoutao Liu
Ahmed Mansy
Bill Martinusen
Nikil Mehta
Andrew Narver
Anshul Nigham
Melanie Obenberger
Sean Smith
Kurt Steinkraus
Sheng Sun
Edward Thiele
Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)
Preview abstract
Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc.
We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support.
This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change
management:
(1) managing conflicts between multiple operations on the same network;
(2) managing conflicts between operations spanning the boundaries between networks;
(3) managing representational changes in the models that drive our automated systems.
These approaches combine both novel software systems and software-engineering practices.
While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too.
View details
Data-driven Networking Research: models for academic collaboration with Industry (a Google point of view)
Computer Communication Review, vol. 51:4 (2021), pp. 47-49
Preview abstract
We (Google's networking teams) would like to increase our collaborations with academic researchers related to data-driven networking research.
There are some significant constraints on our ability to directly share data, and in case not everyone in the community understands these, this document provides a brief summary.
There are some models which can work (primarily, interns and visiting scientists).
We describe some specific areas where we would welcome proposals to work within those models
View details
Cores that don't count
Proc. 18th Workshop on Hot Topics in Operating Systems (HotOS 2021)
Preview abstract
We are accustomed to thinking of computers as fail-stop, especially the cores that execute instructions, and most system software implicitly relies on that assumption. During most of the VLSI era, processors that passed manufacturing tests and were operated within specifications have insulated us from this fiction. As fabrication pushes towards smaller feature sizes and more elaborate computational structures, and as increasingly specialized instruction-silicon pairings are introduced to improve performance, we have observed ephemeral computational errors that were not detected during manufacturing tests. These defects cannot always be mitigated by techniques such as microcode updates, and may be correlated to specific components within the processor, allowing small code changes to effect large shifts in reliability. Worse, these failures are often "silent'': the only symptom is an erroneous computation.
We refer to a core that develops such behavior as "mercurial.'' Mercurial cores are extremely rare, but in a large fleet of servers we can observe the correlated disruption they cause, often enough to see them as a distinct problem -- one that will require collaboration between hardware designers, processor vendors, and systems software architects.
This paper is a call-to-action for a new focus in systems research; we speculate about several software-based approaches to mercurial cores, ranging from better detection and isolating mechanisms, to methods for tolerating the silent data corruption they cause.
Please watch our short video summarizing the paper.
View details
Preview abstract
To reduce cost, datacenter network operators are exploring blocking network designs. An example of such a design is a "spine-free" form of a Fat-Tree, in which pods directly connect to each other, rather than via spine blocks. To maintain application-perceived performance in the face of dynamic workloads, these new designs must be able to reconfigure routing and the inter-pod topology. Gemini is a system designed to achieve these goals on commodity hardware while reconfiguring the network infrequently, rendering these blocking designs practical enough for deployment in the near future.
The key to Gemini is the joint optimization of topology and routing, using as input a robust estimation of future traffic derived from multiple historical traffic matrices. Gemini “hedges” against unpredicted bursts, by spreading these bursts across multiple paths, to minimize packet loss in exchange for a small increase in path lengths. It incorporates a robust decision algorithm to determine when to reconfigure, and whether to use hedging.
Data from tens of production fabrics allows us to categorize these as either low- or high-volatility; these categories seem stable. For the former, Gemini finds topologies and
routing with near-optimal performance and cost. For the latter, Gemini’s use of multi-traffic-matrix optimization and hedging avoids the need for frequent topology reconfiguration, with only marginal increases in path length. As a result, Gemini can support existing workloads on these production fabrics using a spine-free topology that is half the cost of the existing topology on these fabrics.
View details
Experiences with Modeling Network Topologies at Multiple Levels of Abstraction
Martin Pool
Xiaoxue Zhao
17th Symposium on Networked Systems Design and Implementation (NSDI) (2020)
Preview abstract
Network management is becoming increasingly automated,
and automation depends on detailed, explicit representations
of data about both the state of a network, and about an operator’s intent for its networks. In particular, we must explicitly
represent the desired and actual topology of a network; almost all other network-management data either derives from
its topology, constrains how to use a topology, or associates
resources (e.g., addresses) with specific places in a topology.
We describe MALT, a Multi-Abstraction-Layer Topology
representation, which supports virtually all of our network
management phases: design, deployment, configuration, operation, measurement, and analysis. MALT provides interoperability across software systems, and its support for abstraction allows us to explicitly tie low-level network elements to high-level design intent. MALT supports a declarative style that simplifies what-if analysis and testbed support.
We also describe the software base that supports efficient use of MALT, as well as numerous, sometimes painful
lessons we have learned about curating the taxonomy for a
comprehensive, and evolving, representation for topology.
View details
Preview abstract
Cloud customers want reliable, understandable promises from cloud providers that their applications will run reliably and with adequate performance, but today, providers offer only limited guarantees, which creates uncertainty for customers. Providers also must define internal metrics to allow them to operate their systems without violating customer promises or expectations. We explore why these guarantees are hard to define. We show that this problem shares some similarities with the challenges of applying statistics to make decisions based on sampled data. We also suggest that defining guarantees in terms of defense against threats, rather than guarantees for application-visible outcomes, can reduce the complexity of these problems. Overall, we offer a partial framework for thinking about Service Level Objectives (SLOs), and discuss some unsolved challenges.
View details
Minimal Rewiring: Efficient Live Expansion for Clos Data Center Networks
Shizhen Zhao
Joon Ong
Proc. 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2019), USENIX Association (to appear)
Preview abstract
Clos topologies have been widely adopted for large-scale data center networks (DCNs), but it has been difficult to support incremental expansions of Clos DCNs. Some prior work has assumed that it is impossible to design DCN topologies that are both well-structured (non-random) and incrementally expandable at arbitrary granularities.
We demonstrate that it is indeed possible to design such networks, and to expand them while they are carrying live traffic, without incurring packet loss. We use a layer of patch panels between blocks of switches in a Clos network, which makes physical rewiring feasible, and we describe how to use integer linear programming (ILP) to minimize the number of patch-panel connections that must be changed, which makes expansions faster and cheaper. We also describe a block-aggregation technique that makes our ILP approach scalable.
We tested our "minimal-rewiring" solver on two kinds of fine-grained expansions using 2250 synthetic DCN topologies, and found that the solver can handle 99% of these cases while changing under 25% of the connections. Compared to prior approaches, this solver (on average) reduces the number of "stages" per expansion by about 3.1X -- a significant improvement to our operational costs, and to our exposure (during expansions) to capacity-reducing faults.
View details
Preview abstract
We increasingly depend on the availability of online services, either directly as users, or indirectly, when cloud-provider services support directly-accessed services. The availability of these "visible services" depends in complex ways on the availability of a complex underlying set of invisible infrastructure services.
In our experience, most software engineers lack useful frameworks to create and evaluate designs for individual services that support end-to-end availability in these infrastructures, especially given cost, performance, and other constraints on viable commercial services.
Even given the extensive research literature on techniques for replicated state machines and other fault-tolerance mechanisms, we found little help in this literature for addressing infrastructure-wide availability. Past research has often focused on point solutions, rather than end-to-end ones. In particular, it seems quite difficult to define useful targets for infrastructure-level availability, and then to translate these to design requirements for individual services.
We argue that, in many but not all ways, one can think about availability with the mindset that we have learned to use for security, and we discuss some general techniques that appear useful for implementing and operating high-availability
infrastructures. We encourage a shift in emphasis for academic research into availability.
View details
Inferring the Network Latency Requirements of Cloud Tenants
Ramana Rao Kompella
15th Workshop on Hot Topics in Operating Systems (HotOS XV), USENIX Association (2015)
Preview abstract
Cloud IaaS and PaaS tenants rely on cloud providers to provide network infrastructures that make the appropriate tradeoff between cost and performance. This can include mechanisms to help customers understand the performance requirements of their applications. Previous research (e.g., Proteus and Cicada) has shown how to do this for network-bandwidth demands, but cloud tenants may also need to meet latency objectives, which in turn may depend on reliable limits on network latency, and its variance, within the cloud providers infrastructure. On the other hand, if network latency is sufficient for an application, further decreases in latency might add cost without any benefit. Therefore, both tenant and provider have an interest in knowing what network latency is good enough for a given application.
This paper explores several options for a cloud provider to infer a tenants network-latency demands, with varying tradeoffs between requirements for tenant participation, accuracy of inference, and instrumentation overhead. In particular, we explore the feasibility of a hypervisor-only mechanism, which would work without any modifications to tenant code, even in IaaS clouds.
View details
Condor: Better Topologies through Declarative Design
Brandon Schlinker
Radhika Niranjan Mysore
Sean Smith
Amin Vahdat
Minlan Yu
Ethan Katz-Bassett
Michael Rubin
Sigcomm '15, Google Inc (2015)
Preview abstract
The design space for large, multipath datacenter networks is large and complex, and no one design fits all purposes. Network architects must trade off many criteria to design cost-effective, reliable, and maintainable networks, and typically cannot explore much of the design space. We present Condor, our approach to enabling a rapid, efficient design cycle. Condor allows architects to express their requirements as constraints via a Topology Description Language (TDL), rather than having to directly specify network structures. Condor then uses constraint-based synthesis to rapidly generate candidate topologies, which can be analyzed against multiple criteria. We show that TDL supports concise descriptions of topologies such as fat-trees, BCube, and DCell; that we can generate known and novel variants of fat-trees with simple changes to a TDL file; and that we can synthesize large topologies in tens of seconds. We also show that Condor supports the daunting task of designing multi-phase network expansions that can be carried out on live networks.
View details
Preview abstract
Predictably sharing the network is critical to achieving high utilization in the datacenter. Past work has focussed on providing bandwidth to endpoints, but often we want to allocate resources among multi-node services. In this paper, we present Parley, which provides service-centric minimum bandwidth guarantees, which can be composed hierarchically. Parley also supports service-centric weighted sharing of bandwidth in excess of these guarantees. Further, we show how to configure these policies so services can get low latencies even at high network load. We evaluate Parley on a multi-tiered oversubscribed network connecting 90 machines, each with a 10Gb/s network interface, and demonstrate that Parley is able to meet its goals.
View details
Cicada: Introducing Predictive Guarantees for Cloud Networks
Katrina LaCurts
Hari Balakrishnan
Yoshio Turner
6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), USENIX Association (2014)
Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions using FlowTags
Seyed Kaveh Fayazbakhsh
Luis Chang
Vyas Sekar
Minlan Yu
Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’14), USENIX Association (2014), pp. 533-546
Democratic Resolution of Resource Conflicts Between SDN Control Programs
Alvin AuYoung
Yadi Ma
Sujata Banerjee
Jeongkeun Lee
Puneet Sharma
Yoshio Turner
Chen Liang
CoNEXT '14 Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, ACM (2014), pp. 391-402
FlowTags: Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions
Seyed Kaveh Fayazbakhsh
Vyas Sekar
Minlan Yu
Proc. ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN), ACM (2013)
Corybantic: towards the modular composition of SDN control programs
Alvin AuYoung
Sujata Banerjee
Lucian Popa
Jeongkeun Lee
Jayaram Mudigonda
Puneet Sharma
Yoshio Turner
Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks (HotNets-XII), ACM (2013)
ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing
Lucian Popa
Praveen Yalagandula
Sujata Banerjee
Yoshio Turner
Jose Renato Santos
Proceedings of the ACM SIGCOMM 2013 conference, ACM, pp. 351-362
The NIC Is the Hypervisor: Bare-Metal Guests in IaaS Clouds
Jayaram Mudigonda
Jose Renato Santos
Yoshio Turner
14th Workshop on Hot Topics in Operating Systems (HotOS-XiV), USENIX Association (2013)
TweeCards: Tweets Go Postal
Report on the SIGCOMM 2011 conference
John W. Byers
Fadel Adib
Jay Aikat
Danai Chasaki
Ming-Hung Chen
Marshini Chetty
Romain Fontugne
Vijay Gabale
László Gyarmati
Katrina LaCurts
Qi Liao
Marc Mendonca
Trang Cao Minh
S. H. Shah Newaz
Pawan Prakash
Yan Shvartzshnaider
Praveen Yalagandula
Chun-Yu Yang
Computer Communication Review, vol. 42 (2012), pp. 80-96
What we talk about when we talk about cloud network performance
On the Security of Conference and Journal Submission Sites
NetLord: a scalable multi-tenant network architecture for virtualized datacenters
Jayaram Mudigonda
Praveen Yalagandula
Bryan Stiekes
Yanick Pouffary
SIGCOMM (2011), pp. 62-73
DevoFlow: scaling flow management for high-performance networks
Andrew R. Curtis
Jean Tourrilhes
Praveen Yalagandula
Puneet Sharma
Sujata Banerjee
SIGCOMM (2011), pp. 254-265
Report on WREN 2009 -- workshop: research on enterprise networking
Nathan Farrington
Nikhil Handigol
Christoph Mayer
Kok-Kiong Yap
Computer Communication Review, vol. 40 (2010), pp. 44-49
SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies
DevoFlow: cost-effective flow management for high performance enterprise networks
Jean Tourrilhes
Praveen Yalagandula
Puneet Sharma
Andrew R. Curtis
Sujata Banerjee
HotNets (2010), pp. 1
Chimpp: a click-based programming and simulation environment for reconfigurable networking hardware
Operating System Support for NVM+DRAM Hybrid Main Memory
Computer systems research at HP labs
WOWCS: the workshop on organizing workshops, conferences, and symposia for computer systems
Operating Systems Review, vol. 43 (2009), pp. 106-107
Fast switching of threads between cores
Richard D. Strong
Jayaram Mudigonda
Nathan L. Binkert
Dean M. Tullsen
Operating Systems Review, vol. 43 (2009), pp. 35-45
Looking Between the Street Lamps
HotPower (2008)
Before and After WOWCS: A literature survey, A list of papers we wish had been submitted
Open issues in organizing computer systems conferences
Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems
Jayaram Mudigonda
Nathan L. Binkert
Vanish Talwar
IEEE Micro, vol. 28 (2008), pp. 26-41
Auditing to Keep Online Storage Services Honest
WAP5: black-box performance debugging for wide-area systems
Patrick Reynolds
Janet L. Wiener
Marcos Kawazoe Aguilera
Amin Vahdat
WWW (2006), pp. 347-356
Emergent (mis)behavior vs. complex software systems
EuroSys (2006), pp. 293-304
Pip: Detecting the Unexpected in Distributed Systems
Patrick Reynolds
Janet L. Wiener
Mehul A. Shah
Amin Vahdat
NSDI (2006)
SC2D: an alternative to trace anonymization
Operating Systems Should Support Business Change
HotOS (2005)
Remote Direct Memory Access (RDMA) over IP Problem Statement (RFC4297)
HTTP Header Field Registrations (RFC4229)
Predicting Short-Transfer Latency from TCP Arcana: A Trace-based Validation
Martin F. Arlitt
Balachander Krishnamurthy
Internet Measurment Conference (2005), pp. 213-226
Unveiling the transport
Lawrence S. Brakmo
David E. Lowell
Dinesh Subhraveti
Justin Moore
Computer Communication Review, vol. 34 (2004), pp. 99-106
Design, Implementation, and Evaluation of Duplicate Transfer Detection in HTTP
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
IPTPS (2004), pp. 33-43
Clarifying the fundamentals of HTTP
Softw., Pract. Exper., vol. 34 (2004), pp. 103-134
Utilification
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
IPTPS (2004), pp. 33-43
Registration Procedures for Message Header Fields (RFC3864)
Architecture and performance of server-directed transcoding
Björn Knutsson
Honghui Lu
Bryan Hopkins
ACM Trans. Internet Techn., vol. 3 (2003), pp. 392-424
Performance debugging for distributed systems of black boxes
Marcos Kawazoe Aguilera
Janet L. Wiener
Patrick Reynolds
Athicha Muthitacharoen
SOSP (2003), pp. 74-89
TCP Offload Is a Dumb Idea Whose Time Has Come
HotOS (2003), pp. 25-30
Workshop on network-I/O convergence: experience, lessons, implications (NICELI)
Vinay Aggarwal
Olaf Maennel
Allyn Romanow
Computer Communication Review, vol. 33 (2003), pp. 75-80
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
CoRR, vol. cs.NI/0311017 (2003)
2 P2P or Not 2 P2P?
Mema Roussopoulos
Mary Baker
David S. H. Rosenthal
Thomas J. Giuli
CoRR, vol. cs.NI/0311017 (2003)
Clarifying the fundamentals of HTTP
WWW (2002), pp. 25-36
Aliasing on the world wide web: prevalence and performance implications
Delta encoding in HTTP (RFC3229)
Instance Digests in HTTP (RFC3230)
The VCDIFF Generic Differencing and Compression Data Format (RFC3284)
Toward a Rigorous Data Type Model for HTTP
HotOS (2001), pp. 176
Rethinking the TCP Nagle algorithm
Server-directed transcoding
Computer Communications, vol. 24 (2001), pp. 155-162
Application performance pitfalls and TCP's Nagle algorithm
Greg Minshall
Yasushi Saito
Ben Verghese
SIGMETRICS Performance Evaluation Review, vol. 27 (2000), pp. 36-44
Pulse-Per-Second API for UNIX-like Operating Systems, Version 1.0 (RFC2783)
Brittle Metrics in Operating Systems Research
Workshop on Hot Topics in Operating Systems (1999), pp. 90-95
Resource Containers: A New Facility for Resource Management in Server Systems
Y10K and Beyond (RFC 2550)
Hypertext Transfer Protocol -- HTTP/1.1 (RFC2616)
Key Differences Between HTTP/1.0 and HTTP/1.1
Balachander Krishnamurthy
David M. Kristol
Computer Networks, vol. 31 (1999), pp. 1737-1751
A Scalable and Explicit Event Delivery Mechanism for UNIX
Gaurav Banga
Peter Druschel
USENIX Annual Technical Conference, General Track (1999), pp. 253-265
Better operating system features for faster network servers
Gaurav Banga
Peter Druschel
SIGMETRICS Performance Evaluation Review, vol. 26 (1998), pp. 23-30
Errata for 'Potential benefits of delta encoding and data compression for HTTP'
Fred Douglis
Anja Feldmann
Balachander Krishnamurthy
Computer Communication Review, vol. 28 (1998), pp. 51-55
Scalable kernel performance for Internet servers under realistic loads
Potential Benefits of Delta Encoding and Data Compression for HTTP
Rate of Change and other Metrics: a Live Study of the World Wide Web
Fred Douglis
Anja Feldmann
Balachander Krishnamurthy
USENIX Symposium on Internet Technologies and Systems (1997)
Exploring the Bounds of Web Latency Reduction from Caching and Prefetching
Darrell D. E. Long
Tom M. Kroeger
USENIX Symposium on Internet Technologies and Systems (1997)
Eliminating Receive Livelock in an Interrupt-Driven Kemel
Simple Hit-Metering and Usage-Limiting for HTTP (RFC2227)
Use and Interpretation of HTTP Version Numbers (RFC2145)
Hypertext Transfer Protocol -- HTTP/1.1 (RFC2068)
Path MTU Discovery for IP version 6 (RFC1981)
Hinted caching in the Web
ACM SIGOPS European Workshop (1996), pp. 103-108
Eliminating Receive Livelock in an Interrupt-driven Kernel
Performance Implications of Multiple Pointer Sizes
Joel F. Bartlett
Robert N. Mayo
Amitabh Srivastava
USENIX Winter (1995), pp. 187-200
Improving HTTP Latency
Venkata N. Padmanabhan
Computer Networks and ISDN Systems, vol. 28 (1995), pp. 25-35
The Case for Persistent-Connection HTTP
SIGCOMM (1995), pp. 299-313
A Better Update Policy
USENIX Summer (1994), pp. 99-111
Recovery in Spritely NFS
Computing Systems, vol. 7 (1994), pp. 201-262
Big Memories on the Desktop
Workshop on Workstation Operating Systems (1993), pp. 110-115
Network Locality at the Scale of Processes
ACM Trans. Comput. Syst., vol. 10 (1992), pp. 81-109
Observing TCP Dynamics in Real Networks
SIGCOMM (1992), pp. 305-317
Network Locality at the Scale of Processes
SIGCOMM (1991), pp. 273-284
The Effect of Context Switches on Cache Performance
Efficient Use of Workstations for Passive Monitoring of Local Area Networks
SIGCOMM (1990), pp. 253-263
Path MTU discovery (RFC1191)
Spritely NFS: Experiments with Cache-Consistency Protocols
Measured capacity of an Ethernet: myths and reality
IP MTU discovery options (RFC1063)
Fragmentation considered harmful
The Packet Filter: An Efficient Mechanism for User-level Network Code
Internet Standard Subnetting Procedure (RFC950)
Broadcasting Internet Datagrams (RFC919)
IETF (1984)
Internet subnets (RFC917)
IETF (1984)
A Reverse Address Resolution Protocol (RFC903)
Representing Information About Files
ICDCS (1984), pp. 432-439
Broadcasting Internet datagrams in the presence of subnets (RFC922)
IETF (1984)