Finding Related Tables
Venue
SIGMOD (2012)
Publication Year
2012
Authors
Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Y. Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu
BibTeX
Abstract
We consider the problem of finding related tables in a large corpus of heterogenous
tables. Detecting related tables provides users a powerful tool for enhancing their
tables with additional data and enables effective reuse of available public data.
Our first contribution is a framework that captures several types of relatedness,
including tables that are candidates for joins and tables that are candidates for
union. Our second contribution is a set of algorithms for detecting related tables
that can be either unioned or joined. We describe a set of experiments that
demonstrate that our algorithms produce highly related tables. We also show that we
can often improve the results of table search by pulling up tables that are ranked
much lower based on their relatedness to top-ranked tables. Finally, we describe
how to scale up our algorithms and show the results of running it on a corpus of
over a million tables extracted from Wikipedia.
