Aaron Archer
Aaron is a Research Scientist at Google NYC, where he leads the Large-Scale Optimization research team, which is part of the broader NYC Algorithms and Optimization team. He came to Google in 2013 after 9+ years in the Algorithms and Optimization department at the AT&T Shannon Research Laboratory. Aaron earned degrees in Operations Research (M.S. 2002, Ph.D. 2004) from Cornell University, where he was advised by Eva Tardos and supported by a Hertz Fellowship. Prior to that, he studied Mathematics (B.S. 1998) at Harvey Mudd College, where he graduated first in his class.
Aaron has pursued theoretical, experimental and applied research in diverse areas including mathematical programming (e.g., linear, integer, and convex programming), approximation algorithms, graph algorithms, algorithmic game theory, online algorithms, network design, load balancing, and unsupervised machine learning (e.g., clustering). He particularly enjoys finding effective ways to model complex real-world optimization problems. His primary mission at Google has been to apply these and other techniques to improve the efficiency of Google's computational infrastructure, such as the backend for websearch.
Aaron has also made small forays into graphics, machine vision, combinatorics, graph theory, computational ecology, and mathematical rap.
Authored Publications
Google Publications
Other Publications
Sort By
Cache-aware load balancing of data center applications
Aaron Schild
Ray Yang
Richard Zhuang
Proceedings of the VLDB Endowment, vol. 12 (2019), pp. 709-723
Preview abstract
Our deployment of cache-aware load balancing in the Google web search backend reduced cache misses by ~0.5x, contributing to a double-digit percentage increase in the throughput of our serving clusters by relieving a bottleneck. This innovation has benefited all production workloads since 2015, serving billions of queries daily.
A load balancer forwards each query to one of several identical serving replicas. The replica pulls each term's postings list into RAM from flash, either locally or over the network. Flash bandwidth is a critical bottleneck, motivating an application-directed RAM cache on each replica. Sending the same term reliably to the same replica would increase the chance it hits cache, and avoid polluting the other replicas' caches. However, most queries contain multiple terms and we have to send the whole query to one replica, so it is not possible to achieve a perfect partitioning of terms to replicas.
We solve this via a voting scheme, whereby the load balancer conducts a weighted vote by the terms in each query, and sends the query to the winning replica. We develop a multi-stage scalable algorithm to learn these weights. We first construct a large-scale term-query graph from logs and apply a distributed balanced graph partitioning algorithm to cluster each term to a preferred replica. This yields a good but simplistic initial voting table, which we then iteratively refine via cache simulation to capture feedback effects.
View details
Wireless coverage prediction via parametric shortest paths
David S. Johnson
Evdokia Nikolova
Mikkel Thorup
Ger Yang
Proceedings of the Nineteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc'18), ACM (2018), pp. 221-230
Preview abstract
When deciding where to place access points in a wireless network, it is useful to model the signal propagation loss between a proposed antenna location and the areas it may cover. The indoor dominant path (IDP) model, introduced by Wölfle et al., is shown in the literature to have good validation and generalization error, is faster to compute than competing methods, and is used in commercial software such as WinProp, iBwave Design, and CellTrace. Previously, the algorithms known for computing it involved a worst-case exponential-time tree search, with pruning heuristics to speed it up.
We prove that the IDP model can be reduced to a parametric shortest path computation on a graph derived from the walls in the floorplan. It therefore admits a quasipolynomial-time (i.e., nO(logn)) algorithm. We also give a practical approximation algorithm based on running a small constant number of shortest path computations. Its provable worst-case additive error (in dB) can be made arbitrarily small via appropriate choices of parameters, and is well below 1dB for reasonable choices. We evaluate our approximation algorithm empirically against the exact IDP model, and show that it consistently beats its theoretical worst-case bounds, solving the model exactly (i.e., no error) in the vast majority of cases.
View details
Preview abstract
We consider the reachability indexing problem for private-public directed graphs. In these graphs nodes come in three flavors: public—nodes visible to all users, private—nodes visible to a specific set of users, and protected—nodes visible to any user who can see at least one of the node’s parents. We are interested in computing the set of nodes visible to a specific user online. There are two obvious algorithms: precompute the result for every user, or run a reachability algorithm at query time. This paper explores the trade-off between these two strategies.
Our approach is to identify a set of additional visible seed nodes for each user. The online reachability algorithm explores the graph starting at these nodes. We first formulate the problem as asymmetric k-center with outliers, and then give an efficient and practical algorithm. We prove new theoretical guarantees for this problem and show empirically that it performs very well in practice.
View details
Optimal Content Placement for a Large-Scale VoD System
Vijay Gopalakrishnan
Seungjoon Lee
K.K. Ramakrishnan
IEEE/ACM Transactions on Networking, vol. 24 (2016), pp. 2114-2127
Preview abstract
IPTV service providers offering Video-on-Demand currently use servers at each metropolitan office to store all the videos in their library. With the rapid increase in library sizes, it will soon become infeasible to replicate the entire library at each office. We present an approach for intelligent content placement that scales to large library sizes (e.g., 100Ks of videos). We formulate the problem as a mixed integer program (MIP) that takes into account constraints such as disk space, link bandwidth, and content popularity. To overcome the challenges of scale, we employ a Lagrangian relaxation-based decomposition technique combined with integer rounding. Our technique finds a near-optimal solution (e.g., within 1-2%) with orders of magnitude speedup relative to solving even the LP relaxation via standard software. We also present simple strategies to address practical issues such as popularity estimation, content updates, short-term popularity fluctuation, and frequency of placement updates. Using traces from an operational system, we show that our approach significantly outperforms simpler placement strategies. For instance, our MIP-based solution can serve all requests using only half the link bandwidth used by LRU or LFU cache replacement policies. We also investigate the trade-off between disk space and network bandwidth.
View details
Content placement via the exponential potential function method
Vijay Gopalakrishnan
Seungjoon Lee
K.K. Ramakrishnan
Integer Programming and Combinatorial Optimization - 16th International Conference, IPCO 2013, Springer, pp. 49-61
Combining predictors for recommending music: the False Positives' approach to KDD Cup track 2
Suhrid Balakrishnan
Rensheng Wang
Carlos Eduardo Scheidegger
Angus MacLellan
Yifan Hu
Shankar Krishnan
Guangqin Ma
S. Tom Au
JMLR Proceedings, vol. 18 (2012), pp. 199-213
Leveraging video viewing patterns for optimal content placement
Kyung-Wook Hwang
Vijay Gopalakrishnan
Seungjoon Lee
Vishal Misra
K. K. Ramakrishnan
Deborah F. Swayne
NETWORKING 2012 - 11th International IFIP TC 6 Networking Conference Proceedings, Part II, pp. 44-58
Improved Approximation Algorithms for Prize-Collecting Steiner Tree and TSP
MohammadTaghi Hajiaghayi
Howard Karloff
SIAM Journal on Computing, vol. 40(2) (2011), pp. 309-332
Improved approximation algorithms for the minimum latency problem via prize-collecting strolls
Anna Blasiak
Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, SIAM, pp. 429-447
Voting power and target-based site prioritization
Steven J. Phillips
Robert L. Pressey
Desmond Torkornoo
David Johnson
Matthew E. Watts
Biological Conservation, vol. 143 (2010), pp. 1989-1997
Optimizing dispersal corridors for the Cape Proteaceae using network flow
Steven J. Phillips
Paul Williams
Guy Midgley
Ecological Applications, vol. 18 (2008), pp. 1200-1211
A faster, better approximation algorithm for the minimum latency problem
Asaf Levin
David P. Williamson
SIAM Journal on Computing, vol. 37 (2008), pp. 1472-1498
Importance sampling via load-balanced facility location
Shankar Krishnan
Integer Programming and Combinatorial Optimization, 13th International Conference, IPCO 2008, Springer, pp. 316-330
Characterizing truthful mechanisms with convex type spaces
Frugal path mechanisms
The 15 puzzle: How it drove the world crazy, by Jerry Slocum and Dic Sonneveld [book review]
Mathematical Intelligencer, vol. 29 (2007), pp. 83-85
Approximation and collusion in multicast cost sharing
Joan Feigenbaum
Arvind Krishnamurthy
Rahul Sami
Scott Shenker
Games and Economic Behavior, vol. 47 (2004), pp. 36-71
Approximate classification via earthmover metrics
Jittat Fakcharoenphol
Robert Krauthgamer
Kunal Talwar
Eva Tardos
Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, SIAM, pp. 1079-1087
An approximate truthful mechanism for combinatorial auctions with single parameter agents
Christos H. Papadimitriou
Kunal Talwar
Eva Tardos
Internet Mathematics, vol. 1 (2003), pp. 129-150
Lagrangian relaxation for the k-median problem: new insights and continuity properties
Ranjithkumar Rajagopalan
David B. Shmoys
Algorithms - ESA 2003, 11th Annual European Symposium, Springer, pp. 31-42
Two O(log* k)-approximation algorithms for the asymmetric k-center problem
Integer Programming and Combinatorial Optimization, 8th International IPCO Conference (2001), pp. 1-14
March Mathness: an analysis of a nonstandard basketball pool
Truthful mechanisms for one-parameter agents
Eva Tardos
42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, pp. 482-491
On the upper chromatic numbers of the reals
Discrete Mathematics, vol. 214 (2000), pp. 65-75
A modern treatment of the 15 puzzle
American Mathematical Monthly, vol. 106 (1999), pp. 793-799
A case for stricter grading