Jump to Content
Chad Whipkey

Chad Whipkey

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Shasta: Interactive Reporting at Scale
    Stephan Ellner
    Apurv Gupta
    Ben Handy
    Bart Samwel
    Larysa Aharkava
    Jun Xu
    Shivakumar Venkataraman
    Divy Agrawal
    Jeffrey D. Ullman
    SIGMOD, San Francisco, CA (2016) (to appear)
    Preview abstract We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google’s Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex “read-unfriendly” schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to userfacing views, resulting in complex queries joining 50 or more tables. Designed as a layer on top of Google’s F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1’s distributed query engine with facilities such as safe execution of C++ and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced. View details
    F1: A Distributed SQL Database That Scales
    Bart Samwel
    Ben Handy
    Mircea Oancea
    Kyle Littlefield
    David Menestrina
    Stephan Ellner
    Ian Rae
    Traian Stancescu
    VLDB (2013)
    Preview abstract F1 is a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing. View details
    F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
    Mircea Oancea
    Stephan Ellner
    Ben Handy
    Bart Samwel
    Xin Chen
    Beat Jegerlehner
    Kyle Littlefield
    Phoenix Tong
    SIGMOD (2012)
    Preview abstract Many of the services that are critical to Google’s ad business have historically been backed by MySQL. We have recently migrated several of these services to F1, a new RDBMS developed at Google. F1 implements rich relational database features, including a strictly enforced schema, a powerful parallel SQL query engine, general transactions, change tracking and notification, and indexing, and is built on top of a highly distributed storage system that scales on standard hardware in Google data centers. The store is dynamically sharded, supports transactionally-consistent replication across data centers, and is able to handle data center outages without data loss. The strong consistency properties of F1 and its storage system come at the cost of higher write latencies compared to MySQL. Having successfully migrated a rich customerfacing application suite at the heart of Google’s ad business to F1, with no downtime, we will describe how we restructured schema and applications to largely hide this increased latency from external users. The distributed nature of F1 also allows it to scale easily and to support significantly higher throughput for batch workloads than a traditional RDBMS. With F1, we have built a novel hybrid system that combines the scalability, fault tolerance, transparent sharding, and cost benefits so far available only in “NoSQL” systems with the usability, familiarity, and transactional guarantees expected from an RDBMS. View details
    No Results Found