Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

   Abstract