Generating Sentences from a Continuous Space
Venue
CoNLL (2016) (to appear)
Publication Year
2016
Authors
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
BibTeX
Abstract
The standard recurrent neural network language model (RNNLM) generates sentences
one word at a time and does not work from an explicit global sentence
representation. In this work, we introduce and study an RNN-based variational
autoencoder generative model that incorporates distributed latent representations
of entire sentences. This factorization allows it to explicitly model holistic
properties of sentences such as style, topic, and high-level syntactic features.
Samples from the prior over these sentence representations remarkably produce
diverse and well-formed sentences through simple deterministic decoding. By
examining paths through this latent space, we are able to generate coherent novel
sentences that interpolate between known sentences. We present techniques for
solving the difficult learning problem presented by this model, demonstrate its
effectiveness in imputing missing words, explore many interesting properties of the
model's latent sentence space, and present negative results on the use of the model
in language modeling.
