Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data
Venue
Making sense of Microposts (at WWW 2012)
Publication Year
2012
Authors
Davi Reis, Felipe Goldstein, Frederico Quintao
BibTeX
Abstract
In the recent years, a new form of content type has become ubiquitous in the web.
These are small and noisy text snippets, created by users of social networks such
as Twitter and Facebook. The full interpretation of those microposts by machines
impose tremendous challenges, since they strongly rely on context. In this paper we
propose a task which is much simpler than full interpretation of microposts: we aim
to build classification systems to detect keywords that unambiguously refer to a
single dominant concept, even when taken out of context. For example, in the
context of this task, apple would be classified as ambiguous whereas microsoft would
not. The contribution of this work is twofold. First, we formalize this novel
classification task that can be directly applied for extracting information from
microposts. Second, we show how high precision classifiers for this problem can be
built out of Web data and search engine logs, combining traditional information
retrieval metrics, such as inverted document frequency, and new ones derived from
search query logs. Finally, we have proposed and evaluated relevant applications
for these classifiers, which were able to meet precision ≥ 72% and recall ≥ 56% on
unambiguous keyword extraction from microposts. We also compare those results with
closely related systems, none of which could outperform those numbers.
