Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams
Venue
SIGMOD '13: Proceedings of the 2013 international conference on Management of data, ACM, New York, NY, USA, pp. 577-588
Publication Year
2013
Authors
Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman
BibTeX
Abstract
Web-based enterprises process events generated by millions of users interacting
with their websites. Rich statistical data distilled from combining such
interactions in near real-time generates enormous business value. In this paper, we
describe the architecture of Photon, a geographically distributed system for
joining multiple continuously flowing streams of data in real-time with high
scalability and low latency, where the streams may be unordered or delayed. The
system fully tolerates infrastructure degradation and datacenter-level outages
without any manual intervention. Photon guarantees that there will be no duplicates
in the joined output (at-most-once semantics) at any point in time, that most
joinable events will be present in the output in real-time (near-exact semantics),
and exactly-once semantics eventually. Photon is deployed within Google Advertising
System to join data streams such as web search queries and user clicks on
advertisements. It produces joined logs that are used to derive key business
metrics, including billing for advertisers. Our production deployment processes
millions of events per minute at peak with an average end-to-end latency of less
than 10 seconds. We also present challenges and solutions in maintaining large
persistent state across geographically distant locations, and highlight the design
principles that emerged from our experience.