Sentence Compression by Deletion with LSTMs
Venue
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15)
Publication Year
2015
Authors
Katja Filippova, Enrique Alfonseca, Carlos Colmenares, Lukasz Kaiser, Oriol Vinyals
BibTeX
Abstract
We present an LSTM approach to deletion-based sentence compression where the task
is to translate a sentence into a sequence of zeros and ones, corresponding to
token deletion decisions. We demonstrate that even the most basic version of the
system, which is given no syntactic information (no PoS or NE tags, or
dependencies) or desired compression length, performs surprisingly well: around 30%
of the compressions from a large test set could be regenerated. We compare the LSTM
system with a competitive baseline which is trained on the same amount of data but
is additionally provided with all kinds of linguistic features. In an experiment
with human raters the LSTM-based model outperforms the baseline achieving 4.5 in
readability and 3.8 in informativeness.