Publication Data

   MapReduce/Bigtable for Distributed Optimization

Abstract: For large data it can be very time consuming to run gradient based optimizat ion,for example to minimize the log-likelihood for maximum entropy models.Distributed methods are therefore appealing and a number of distributed gradientoptimization strategies have been proposed including: distributed gradient, asynchronousupdates, and iterative parameter mixtures. In this paper, we evaluatethese various strategies with regards to their accuracy and speed over MapReduce/Bigtable and discuss the techniques needed for high performance.