Jump to Content

Daniel Myers

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Fast key-value stores: An idea whose time has come and gone
    Atul Adya
    Henry Qin
    Robert Grandl
    HotOS XVII (2019) (to appear)
    Preview abstract Remote, in-memory key-value (RInK) stores such as Memcached and Redis are widely used in industry and are an active area of academic research. Coupled with stateless application servers to execute business logic and a database-like system to provide persistent storage, they form a core component of popular data center service architectures. We argue that the time of the RInK store has come and gone: their domain-independent APIs (e.g., PUT/GET) push complexity back to the application, leading to extra (un)marshalling overheads and network hops. Instead, data center services should be built using stateful application servers or custom in-memory stores with domain-specific APIs, which offer higher performance than RInKs at lower cost. Such designs have been avoided because they are challenging to implement without appropriate infrastructure support. Given recent advances in auto-sharding we argue it is time to revisit these decisions. In this paper, we evaluate the potential performance improvements of stateful designs, propose a new abstraction, the linked, in-memory key-value (LInK) store, to enable developers to easily implement stateful services, and discuss areas for future research. Slides from our HotOS talk are available here. View details
    Slicer: Auto-Sharding for Datacenter Applications
    Atul Adya
    Jon Howell
    Jeremy Elson
    Colin Meek
    Vishesh Khemani
    Stefan Fulger
    Pan Gu
    Lakshminath Bhuvanagiri
    Jason Hunter
    Roberto Peon
    Alexander Shraer
    Kfir Lev-Ari
    OSDI 2016 (2016)
    Preview abstract Sharding is a fundamental building block of large-scale applications, but most have their own custom, ad-hoc implementations. Our goal is to make sharding as easily reusable as a filesystem or lock manager. Slicer is \Google's general purpose sharding service. It monitors signals such as load hotspots and server health and dynamically shards work over a set of servers. Its goals are to maintain high availability and reduce load imbalance while minimizing churn from moved work. In this paper, we describe Slicer's design and implementation. Slicer has the consistency and global optimization of a centralized sharder while approaching the high availability, scalability, and low latency of systems that make local decisions. It achieves this by separating concerns: a reliable data plane forwards requests, and a smart control plane makes load-balancing decisions off the critical path. Slicer's small but powerful API has proven useful and easy to adopt in dozens of \Google applications. It is used to allocate resources for web service front-ends, coalesce writes to increase storage bandwidth, and increase the efficiency of a web cache. It currently handles 2-6M~req/s of production traffic. Production workloads using Slicer exhibit a most-loaded task 30\%--180\% of the mean load, even for highly skewed and time-varying loads. View details
    Thialfi: A Client Notification Service for Internet-Scale Applications
    Atul Adya
    Gregory Cooper
    Michael Piatek
    Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP) (2011), pp. 129-142
    Preview abstract Ensuring the freshness of client data is a fundamental problem for applications that rely on cloud infrastructure to store data and mediate sharing. Thialfi is a notification service developed at Google to simplify this task. Thialfi supports applications written in multiple programming languages and running on multiple platforms, e.g., browsers, phones, and desktops. Applications register their interest in a set of shared objects and receive notifications when those objects change. Thialfi servers run in multiple Google data centers for availability and replicate their state asynchronously. Thialfi's approach to recovery emphasizes simplicity: all server state is soft, and clients drive recovery and assist in replication. A principal goal of our design is to provide a straightforward API and good semantics despite a variety of failures, including server crashes, communication failures, storage unavailability, and data center failures. Evaluation of live deployments confirms that Thialfi is scalable, efficient, and robust. In production use, Thialfi has scaled to millions of users and delivers notifications with an average delay of less than one second. View details
    No Results Found