Topical clustering of search results
Venue
Proceedings of the fifth ACM international conference on Web search and data mining, ACM, New York, NY, USA (2012), pp. 223-232
Publication Year
2012
Authors
Ugo Scaiella, Paolo Ferragina, Andrea Marino, Massimiliano Ciaramita
BibTeX
Abstract
Search results clustering (SRC) is a challenging algorithmic problem that requires
grouping together the results returned by one or more search engines in topically
coherent clusters, and labeling the clusters with meaningful phrases describing the
topics of the results included in them. In this paper we propose to solve SRC via
an innovative approach that consists of modeling the problem as the labeled
clustering of the nodes of a newly introduced graph of topics. The topics are
Wikipedia-pages identified by means of recently proposed topic annotators [9, 11,
16, 20] applied to the search results, and the edges denote the relatedness among
these topics computed by taking into account the linkage of the Wikipedia-graph. We
tackle this problem by designing a novel algorithm that exploits the spectral
properties and the labels of that graph of topics. We show the superiority of our
approach with respect to academic state-of-the-art work [6] and well-known
commercial systems (CLUSTY and LINGO3G) by performing an extensive set of
experiments on standard datasets and user studies via Amazon Mechanical Turk. We
test several standard measures for evaluating the performance of all systems and
show a relative improvement of up to 20%.
