Discovering Structure in the Universe of Attribute Names
Venue
Proc. 25th International World Wide Web Conference (2016)
Publication Year
2016
Authors
Alon Halevy, Natalya Fridman Noy, Sunita Sarawagi, Steven Euijong Whang, Xiao Yu
BibTeX
Abstract
Recently, search engines have invested significant effort to answering
entity--attribute queries from structured data, but have focused mostly on queries
for frequent attributes. In parallel, several research efforts have demonstrated
that there is a long tail of attributes, often thousands per class of entities,
that are of interest to users. Researchers are beginning to leverage these new
collections of attributes to expand the ontologies that power search engines and to
recognize entity--attribute queries. Because of the sheer number of potential
attributes, such tasks require us to impose some structure on this long and heavy
tail of attributes. This paper introduces the problem of organizing the attributes
by expressing the compositional structure of their names as a rule-based grammar.
These rules offer a compact and rich semantic interpretation of multi-word
attributes, while generalizing from the observed attributes to new unseen ones. The
paper describes an unsupervised learning method to generate such a grammar
automatically from a large set of attribute names. Experiments show that our method
can discover a precise grammar over 100,000 attributes of {\sc Countries} while
providing a 40-fold compaction over the attribute names. Furthermore, our grammar
enables us to increase the precision of attributes from 47\% to more than 90\% with
only a minimal curation effort. Thus, our approach provides an efficient and
scalable way to expand ontologies with attributes of user interest.
