The SMAPH System for Query Entity Recognition and Disambiguation
Venue
ERD 2014: Entity Recognition and Disambiguation Challenge. SIGIR Forum., ACM
Publication Year
2014
Authors
Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita, Stefan Rued, Hinrich Schuetze
BibTeX
Abstract
The SMAPH system implements a pipeline of four main steps: (1) Fetching – it
fetches the search results returned by a search engine given the query to be
annotated; (2) Spotting – search result snippets are parsed to identify candidate
mentions for the entities to be annotated. This is done in a novel way by detecting
the keywords-in-context by looking at the bold parts of the search snippets; (3)
Candidate generation – candidate entities are generated in two ways: from the
Wikipedia pages occurring in the search results, and from an existing annotator,
using the mentions identified in the spotting step as input; (4) Pruning – a binary
SVM classifier is used to decide which entities to keep/discard in order to
generate the final annotation set for the query. The SMAPH system ranked third on
the development set and first on the final blind test of the 2014 ERD Challenge
short text track.
