Scaling Distributed Machine Learning with the Parameter Server
Venue
Operating Systems Design and Implementation (OSDI), USENIX (2014), pp. 583-598
Publication Year
2014
Authors
Mu Li, David G. Anderson, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su
BibTeX
Abstract
We propose a parameter server framework for distributed machine learning problems.
Both data and workloads are distributed over worker nodes, while the server nodes
maintain globally shared parameters, represented as dense or sparse vectors and
matrices. The framework manages asynchronous data communication between nodes, and
supports flexible consistency models, elastic scalability, and continuous fault
tolerance. To demonstrate the scalability of the proposed framework, we show
experimental results on petabytes of real data with billions of examples and
parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet
Allocation and Distributed Sketching.
