Federated Learning of Deep Networks using Model Averaging
Venue
Preprint (2016)
Publication Year
2016
Authors
H. Brendan McMahan, Eider Moore, Daniel Ramage, Blaise Aguera y Arcas
BibTeX
Abstract
Modern mobile devices have access to a wealth of data suitable for learning models,
which in turn can greatly improve the user experience on the device. For example,
language models can improve speech recognition and text entry, and image models can
automatically select good photos. However, this rich data is often privacy
sensitive, large in quantity, or both, which may preclude logging to the
data-center and training there using conventional approaches. We advocate an
alternative that leaves the training data distributed on the mobile devices, and
learns a shared model by aggregating locally-computed updates. We term this
decentralized approach Federated Learning. We present a practical method for the
federated learning of deep networks that proves robust to the unbalanced and
non-IID data distributions that naturally arise. This method allows high-quality
models to be trained in relatively few rounds of communication, the principal
constraint for federated learning. The key insight is that despite the non-convex
loss functions we optimize, parameter averaging over updates from multiple clients
produces surprisingly good results, for example decreasing the communication needed
to train an LSTM language model by two orders of magnitude.
