Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries
Venue
Proceedings on Privacy Enhancing Technologies (PoPETS), vol. issue 3, 2016 (2016) (to appear)
Publication Year
2016
Authors
Giulia Fanti, Vasyl Pihur, Úlfar Erlingsson
BibTeX
Abstract
Techniques based on randomized response enable the collection of potentially
sensitive data from clients in a privacy-preserving manner with strong local
differential privacy guarantees. A recent such technology, RAPPOR, enables
estimation of the marginal frequencies of a set of strings via privacy-preserving
crowdsourcing. However, this original estimation process relies on a known
dictionary of possible strings; in practice, this dictionary can be extremely large
and/or unknown. In this paper, we propose a novel decoding algorithm for the RAPPOR
mechanism that enables the estimation of "unknown unknowns,'' i.e., strings we do
not know we should be estimating. To enable learning without explicit dictionary
knowledge, we develop methodology for estimating the joint distribution of multiple
variables collected with RAPPOR. Our contributions are not RAPPOR-specific, and can
be generalized to other local differential privacy mechanisms for learning
distributions of string-valued random variables.
