Hierarchical Planning for Device Placement
We introduce a hierarchical planning algorithm for finding efficient device placements for computational graphs, especially effective in an environment with a combination of CPUs and GPUs. The algorithm first learns to assign graph operations to groups and then allocates these groups to available devices. The grouping and device allocation are learned jointly. The proposed algorithm is trained by a policy gradient method and requires no human intervention. Our experiments with computer vision and natural language model benchmarks show that our algorithm can find optimized, non-trivial placements for TensorFlow (TF) computational graphs with over 80,000 operations. Our approach also outperforms human experts as well as a state-of-the art method based on deep reinforcement learning. On a Neural Machine Translation model, our method achieves a 240% improvement in training time per iteration.