Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

   Abstract