Bayesian Dark Knowledge
Venue
Advances in Neural Information Processing Systems (2015)
Publication Year
2015
Authors
Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling
BibTeX
Abstract
We consider the problem of Bayesian parameter estimation for deep neural networks,
which is important in problem settings where we may have little data, and/ or where
we need accurate posterior predictive densities, e.g., for applications involving
bandits or active learning. One simple approach to this is to use online Monte
Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately,
such a method needs to store many copies of the parameters (which wastes memory),
and needs to make predictions using many versions of the model (which wastes time).
We describe a method for "distilling" a Monte Carlo approximation to the posterior
predictive density into a more compact form, namely a single deep neural network.
We compare to two very recent approaches to Bayesian neural networks, namely an
approach based on expectation propagation [Hernandez-Lobato and Adams, 2015] and an
approach based on variational Bayes [Blundell et al., 2015]. Our method performs
better than both of these, is much simpler to implement, and uses less computation
at test time.
