A New Entity Salience Task with Millions of Training Examples
Venue
Proceedings of the European Association for Computational Linguistics, Association for Computational Linguistics (2014)
Publication Year
2014
Authors
Dan Gillick, Jesse Dunietz
BibTeX
Abstract
Although many NLP systems are moving toward entity-based processing, most still
identify important phrases using classical keyword-based approaches. To bridge this
gap, we introduce the task of entity salience: assigning a relevance score to each
entity in a document. We demonstrate how a labeled corpus for the task can be
automatically generated from a corpus of documents and accompanying abstracts. We
then show how a classifier with features derived from a standard NLP pipeline
outperforms a strong baseline by 34%. Finally, we outline initial experiments on
further improving accuracy by leveraging background knowledge about the
relationships between entities.
