Cold-Start Reinforcement Learning with Softmax Policy Gradients

  Abstract