Discovering Structure in the Universe of Attribute Names
Venue
Proc. 25th International Conference on World Wide Web (WWW) (2016) (to appear)
Publication Year
2016
Authors
Alon Halevy, Natalya Noy, Sunita Sarawagi, Steven Euijong Whang, Xiao Yu
BibTeX
Abstract
Recently, search engines have invested significant effort to answering
entity--attribute queries from structured data, but have focused mostly on queries
for frequently occurring attributes. In parallel, several research efforts have
demonstrated that there is a long tail of attributes, often thousands per class of
entities, that are of interest to users. Researchers are beginning to leverage
these new collections of attributes to expand the ontologies that power search
engines and to recognize entity--attribute queries. Because of the sheer number of
potential attributes, such tasks require us to impose some structure on this long
and heavy tail of attributes. This paper introduces the problem of organizing the
attributes by expressing the compositional structure of their names as a rule-based
grammar. These rules offer a compact and rich semantic interpretation of multi-word
attributes, while generalizing from the observed attributes to new unseen ones. The
paper describes an unsupervised learning method to generate such a grammar
automatically from a large set of attribute names. Experiments show that our method
can discover a precise grammar over 100,000 attributes of {\sc Countries} while
providing a 40-fold compaction over the attribute names. Furthermore, our grammar
enables us to increase the precision of attributes from 47\% to more than 90\% with
only a minimal curation effort. Thus, our approach provides an efficient and
scalable way to expand ontologies with attributes of user interest.
