Data Fusion: Resolving Conflicts from Multiple Sources
Venue
WAIM (2013), pp. 64-76 (to appear)
Publication Year
2013
Authors
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava
BibTeX
Abstract
Many data management applications, such as setting up Web portals, managing
enterprise data, managing community data, and sharing scientific data, require
integrating data from multiple sources. Each of these sources provides a set of
values and different sources can often provide conflicting values. To present
quality data to users, it is critical to resolve conflicts and discover values that
reflect the real world; this task is called data fusion. This paper describes a
novel approach that finds true values from conflicting information when there are a
large number of sources, among which some may copy from others. We present a case
study on real-world data showing that the described algorithm can significantly
improve accuracy of truth discovery and is scalable when there are a large number
of data sources.
