Rolling Up Random Variables in Data Cubes
Joint Statistical Meetings, American Statistical Association, 732 North Washington Street, Alexandria, VA 22314-1943 (2013)
Data cubes, first developed in the context of on-line analytic processing (OLAP) applications for databases, have become increasingly widespread as a means of structuring data aggregations in other contexts. For example, increasing levels of aggregation in a data cube can be used to impose a hierarchical structure---often referred to as roll-ups---on sets of cross-categorized values, producing a summary description that takes advantage of commonalities within the cube categories. In this paper, we describe a novel technique for realizing such a hierarchical structure in a data cube containing discrete random variables. Using a generalization of an approach due to Chow and Liu, this technique construes roll-ups as parsimonious approximations to the joint distribution of the variables in terms of the aggregation structure of the cube. The technique is illustrated using a real-life application that involves monitoring and reporting anomalies in Web traffic streams over time.