Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Venue
Advances in Neural Information Processing Systems, NIPS (2015)
Publication Year
2015
Authors
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam M. Shazeer
BibTeX
Abstract
Recurrent Neural Networks can be trained to produce sequences of tokens given some
input, as exemplified by recent results in machine translation and image
captioning. The current approach to training them consists of maximizing the
likelihood of each token in the sequence given the current (recurrent) state and
the previous token. At inference, the unknown previous token is then replaced by a
token generated by the model itself. This discrepancy between training and
inference can yield errors that can accumulate quickly along the generated
sequence. We propose a curriculum learning strategy to gently change the training
process from a fully guided scheme using the true previous token, towards a less
guided scheme which mostly uses the generated token instead. Experiments on several
sequence prediction tasks show that this approach yields significant improvements.
Moreover, it was used successfully in our winning entry to the MSCOCO image
captioning challenge, 2015.
