Publication Data
Finding Related Tables
Abstract: We consider the problem of finding related tables in a large
corpus of heterogenous tables. Detecting related tables provides users a powerful tool
for enhancing their tables with additional data and enables effective reuse of
available public data. Our first contribution is a framework that captures several
types of relatedness, including tables that are candidates for joins and tables that
are candidates for union. Our second contribution is a set of algorithms for detecting
related tables that can be either unioned or joined. We describe a set of experiments
that demonstrate that our algorithms produce highly related tables. We also show that
we can often improve the results of table search by pulling up tables that are ranked
much lower based on their relatedness to top-ranked tables. Finally, we describe how to
scale up our algorithms and show the results of running it on a corpus of over a
million tables extracted from Wikipedia.
