Availability in Globally Distributed Storage Systems
Venue
Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX (2010)
Publication Year
2010
Authors
Daniel Ford, Francois Labelle, Florentina Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, Sean Quinlan
BibTeX
Abstract
Highly available cloud storage is often implemented with complex, multi-tiered
distributed systems built on top of clusters of commodity servers and disk drives.
Sophisticated management, load balancing and recovery techniques are needed to
achieve high performance and availability amidst an abundance of failure sources
that include software, hardware, network connectivity, and power issues. While
there is a relative wealth of failure studies of individual components of storage
systems, such as disk drives, relatively little has been reported so far on the
overall availability behavior of large cloud-based storage services. We
characterize the availability properties of cloud storage systems based on an
extensive one year study of Google's main storage infrastructure and present
statistical models that enable further insight into the impact of multiple design
choices, such as data placement and replication strategies. With these models we
compare data availability under a variety of system parameters given the real
patterns of failures observed in our fleet.