Semi-supervised sequence learning
Venue
Advances in Neural Information Processing Systems, NIPS (2015)
Publication Year
2015
Authors
BibTeX
Abstract
We present two approaches that use unlabeled data to improve sequence learning with
recurrent networks. The first approach is to predict what comes next in a sequence,
which is a conventional language model in natural language processing. The second
approach is to use a sequence autoencoder, which reads the input sequence into a
vector and predicts the input sequence again. These two algorithms can be used as a
“pretraining” step for a later supervised sequence learning algorithm. In other
words, the parameters obtained from the unsupervised step can be used as a starting
point for other supervised training models. In our experiments, we find that long
short term memory recurrent networks after being pretrained with the two approaches
are more stable and generalize better. With pretraining, we are able to train long
short term recurrent networks up to a few hundred timesteps, thereby achieving
strong performance in many text classification tasks, such as IMDB, DBpedia and 20
Newsgroups.
