TSum: Fast, Principled Table Summarization.
Venue
Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, ACM (2013)
Publication Year
2013
Authors
Jieying Chen, Jia-Yu Pan, Christos Faloutsos, Spiros Papadimitriou
BibTeX
Abstract
Given a table where rows correspond to records and columns correspond to
attributes, we want to find a small number of patterns that succinctly summarize
the dataset. For example, given a set of patient records with several attributes
each, how can we find (a) that the "most representative" pattern is, say, (male,
adult, *), followed by (*, child, low-cholesterol), etc.? We propose TSum, a method
that provides a sequence of patterns ordered by their "representativeness." It can
decide both which these patterns are, as well as how many are necessary to properly
summarize the data. Our main contribution is formulating a general framework, TSum,
using compression principles. TSum can easily accommodate different optimization
strategies for selecting and refining patterns. The discovered patterns can be used
to both represent the data efficiently, as well as interpret it quickly. Extensive
experiments demonstrate the effectiveness and intuitiveness of our discovered
patterns.
