Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Samy Bengio
Navdeep Jaitly
Noam M. Shazeer
Advances in Neural Information Processing Systems, NIPS (2015)

Abstract

Recurrent Neural Networks can be trained to produce sequences of tokens given
some input, as exemplified by recent results in machine translation and image
captioning. The current approach to training them consists of maximizing the
likelihood of each token in the sequence given the current (recurrent) state
and the previous token. At inference, the unknown previous token is then
replaced by a token generated by the model itself. This discrepancy between
training and inference can yield errors that can accumulate quickly along the
generated sequence.
We propose a curriculum learning strategy to gently change the
training process from a fully guided scheme using the true previous token,
towards a less guided scheme which mostly uses the generated token instead.
Experiments on several sequence prediction tasks show that this approach
yields significant improvements. Moreover, it was used successfully
in our winning entry to the MSCOCO image captioning challenge, 2015.

Research Areas