Publication Data
Learning to Rank Answers to Non-Factoid Questions from Web Collections
Abstract: This work investigates the use of linguistically motivated
features to improve search, in particular for ranking answers to non-factoid questions.
We show that it is possible to exploit existing large collections of question–answer
pairs (from online social Question Answering sites) to extract such features and train
ranking models which combine them effectively. We investigate a wide range of feature
types, some exploiting natural language processing such as coarse word sense
disambiguation, named-entity identification, syntactic parsing, and semantic role
labeling. Our experiments demonstrate that linguistic features, in combination, yield
considerable improvements in accuracy. Depending on the system settings we measure
relative improvements of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing
one of the most compelling evidence to date that complex linguistic features such as
word senses and semantic roles can have a significant impact on large-scale information
retrieval tasks.
