Rolling Up Random Variables in Data Cubes
Venue
Joint Statistical Meetings, American Statistical Association, 732 North Washington Street, Alexandria, VA 22314-1943 (2013)
Publication Year
2013
Authors
BibTeX
Abstract
Data cubes, first developed in the context of on-line analytic processing (OLAP)
applications for databases, have become increasingly widespread as a means of
structuring data aggregations in other contexts. For example, increasing levels of
aggregation in a data cube can be used to impose a hierarchical structure---often
referred to as roll-ups---on sets of cross-categorized values, producing a summary
description that takes advantage of commonalities within the cube categories. In
this paper, we describe a novel technique for realizing such a hierarchical
structure in a data cube containing discrete random variables. Using a
generalization of an approach due to Chow and Liu, this technique construes
roll-ups as parsimonious approximations to the joint distribution of the variables
in terms of the aggregation structure of the cube. The technique is illustrated
using a real-life application that involves monitoring and reporting anomalies in
Web traffic streams over time.
