Jump to Content

TSum: Fast, Principled Table Summarization.

Jieying Chen
Christos Faloutsos
Spiros Papadimitriou
Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, ACM (2013)

Abstract

Given a table where rows correspond to records and columns correspond to attributes, we want to find a small number of patterns that succinctly summarize the dataset. For example, given a set of patient records with several attributes each, how can we find (a) that the "most representative" pattern is, say, (male, adult, *), followed by (*, child, low-cholesterol), etc.? We propose TSum, a method that provides a sequence of patterns ordered by their "representativeness." It can decide both which these patterns are, as well as how many are necessary to properly summarize the data. Our main contribution is formulating a general framework, TSum, using compression principles. TSum can easily accommodate different optimization strategies for selecting and refining patterns. The discovered patterns can be used to both represent the data efficiently, as well as interpret it quickly. Extensive experiments demonstrate the effectiveness and intuitiveness of our discovered patterns.