Motivated by the need for distributed optimization algorithms with low
communication cost, we study communication efficient algorithms to perform
distributed mean estimation. We study the scenarios in which each client sends one
bit per dimension. We first show that for d dimensional data with n clients, a
naive stochastic rounding approach yields a mean squared error Theta(d/n). We then
show by applying a structured random rotation of the data (an O(dlogd) algorithm),
the error can be reduced to O(logd/n). The methods we show in this paper do not
depend on the distribution of the data.