A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries
Venue
WWW 2016
Publication Year
2016
Authors
Hinrich Schuetze, Marco Cornolti, Massimiliano Ciaramita, Paolo Ferragina, Stefan Rued
BibTeX
Abstract
In this paper we study the problem of linking open-domain web-search queries
towards entities drawn from the full entity inventory of Wikipedia articles. We
introduce SMAPH- 2 to attack this problem, a second-order approach that, by
piggybacking on a web search engine, alleviates the noise and irregularities that
characterize the language of queries and puts queries in a larger context in which
it is easier to make sense of them. The key algorithmic idea under- lying SMAPH-2
is to first discover a candidate set of entities and then link-back those entities
to their mentions occurring in the input query. This allows us to confine the
possible concepts pertinent to the query to only the ones really mentioned in it.
The link-back is implemented via a collective disambiguation step based upon a
supervised ranking model that makes one joint prediction for the annotation of the
complete query optimizing directly the F1 mea- sure. We evaluate both known
features, such as word em- beddings and semantic relatedness among entities, and
several novel features such as an approximate distance between mentions and
entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves
state-of-the-art on the ERD@SIGIR2014 benchmark. We also publish GERDAQ, a novel
dataset we built specifically for web-query entity linking via a crowdsourcing
effort, and show that SMAPH- 2 outperforms the benchmarks by comparable margins on
GERDAQ.
