Dialogue management is the component of a dialogue system that determines the
optimal action for the system to take at each turn. An important consideration for
dialogue managers is the ability to adapt to new user behaviors unseen during
training. In this paper, we investigate policy gradient based methods for
interactive reinforcement learning where the agent receives action-specific
feedback from the user and incorporates this feedback into its policy. We show that
using the feedback to directly shape the policy enables a dialogue manager to learn
new interactions faster compared to interpreting the feedback as a reward value.