Obfuscatory obscanturism: making workload traces of commercially-sensitive systems safe to release
Venue
CloudMAN, IEEE, Maui, HI, USA (2012)
Publication Year
2012
Authors
Charles Reiss, John Wilkes, Joseph L. Hellerstein
BibTeX
Abstract
Cloud providers such as Google are interested in fostering research on the daunting
technical challenges they face in supporting planetary-scale distributed systems,
but no academic organizations have similar scale systems on which to experiment.
Fortunately, good research can still be done using traces of real-life production
workloads, but there are risks in releasing such data, including inadvertently
disclosing confidential or proprietary information, as happened with the Netflix
Prize data. This paper discusses these risks, and our approach to them, which we
call {\em systematic obfuscation}. It protects proprietary and personal data while
leaving it possible to answer some interesting research questions. We explain and
motivate some of the risks and concerns and propose how they can best be mitigated,
using as an example our recent publication of a month-long trace of a production
system workload on a 11k-machine cluster.
