Web-scale Data Integration: You can only afford to Pay As You Go
Venue
CIDR (2007)
Publication Year
2007
Authors
Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (Luna) Dong, David Ko, Cong Yu, Alon Halevy
BibTeX
Abstract
The World Wide Web is witnessing an increase in the amount of structured content -
vast heterogeneous collections of structured data are on the rise due to the Deep
Web, annotation schemes like Flickr, and sites like Google Base. While this
phenomenon is creating an opportunity for structured data management, dealing with
heterogeneity on the web-scale presents many new challenges. In this paper, we
highlight these challenges in two scenarios - the Deep Web and Google Base. We
contend that traditional data integration techniques are no longer valid in the
face of such heterogeneity and scale. We propose a new data integration
architecture, PAYGO, which is inspired by the concept of dataspaces and emphasizes
pay-as-you-go data management as means for achieving web-scale data integration.
