Learning to Rank Answers to Non-Factoid Questions from Web Collections
Venue
Computational Linguistics, vol. 37 (2011), pp. 351-383
Publication Year
2011
Authors
Mihai Surdeanu, Massimiliano Ciaramita, Hugo Zaragoza
BibTeX
Abstract
This work investigates the use of linguistically motivated features to improve
search, in particular for ranking answers to non-factoid questions. We show that it
is possible to exploit existing large collections of question–answer pairs (from
online social Question Answering sites) to extract such features and train ranking
models which combine them effectively. We investigate a wide range of feature
types, some exploiting natural language processing such as coarse word sense
disambiguation, named-entity identification, syntactic parsing, and semantic role
labeling. Our experiments demonstrate that linguistic features, in combination,
yield considerable improvements in accuracy. Depending on the system settings we
measure relative improvements of 14% to 21% in Mean Reciprocal Rank and
Precision@1, providing one of the most compelling evidence to date that complex
linguistic features such as word senses and semantic roles can have a significant
impact on large-scale information retrieval tasks.
