Comparing Consensus Monte Carlo Strategies for Distributed Bayesian Computation
Venue
Brazillian Journal of Probability and Statistics, vol. TBD (2017), TBD
Publication Year
2017
Authors
Steve Scott
BibTeX
Abstract
Consensus Monte Carlo is an algorithm for conducting Monte Carlo based Bayesian
inference on large data sets distributed across many worker machines in a data
center. The algorithm works by running a separate Monte Carlo algorithm on each
worker machine, which only sees a portion of the full data set. The worker-level
posterior samples are then combined to form a Monte Carlo approximation to the full
posterior distribution based on the complete data set. We compare several methods
of carrying out the combination, including a new method based on approximating
worker-level simulations using a mixture of multivariate Gaussian distributions. We
find that resampling and kernel density based methods break down after 10 or
sometimes fewer dimensions, while the new mixture-based approach works well, but
the necessary mixture models take too long to fit.