Comparing Consensus Monte Carlo Strategies for Distributed Bayesian Computation
Abstract
Consensus Monte Carlo is an algorithm for conducting Monte Carlo
based Bayesian inference on large data sets distributed across many
worker machines in a data center. The algorithm works by running a
separate Monte Carlo algorithm on each worker machine, which only
sees a portion of the full data set. The worker-level posterior
samples are then combined to form a Monte Carlo approximation to the
full posterior distribution based on the complete data set. We
compare several methods of carrying out the combination, including a
new method based on approximating worker-level simulations using a
mixture of multivariate Gaussian distributions. We find that
resampling and kernel density based methods break down after 10 or
sometimes fewer dimensions, while the new mixture-based approach
works well, but the necessary mixture models take too long to fit.
based Bayesian inference on large data sets distributed across many
worker machines in a data center. The algorithm works by running a
separate Monte Carlo algorithm on each worker machine, which only
sees a portion of the full data set. The worker-level posterior
samples are then combined to form a Monte Carlo approximation to the
full posterior distribution based on the complete data set. We
compare several methods of carrying out the combination, including a
new method based on approximating worker-level simulations using a
mixture of multivariate Gaussian distributions. We find that
resampling and kernel density based methods break down after 10 or
sometimes fewer dimensions, while the new mixture-based approach
works well, but the necessary mixture models take too long to fit.