In Dremel, data is stored as nested relations. The schema for a relation is a tree,
all of whose nodes are attributes, and whose leaf attributes hold values. We
explore filter and aggregate queries that are given in the Dremel dialect of SQL.
Complications arise because of repeated attributes, i.e., attributes that are
allowed to have more than one value. We focus on the common class of Dremel queries
that are processed on column-stored data in a way that results in query processing
time that is linear on the size of the relevant data, i.e., data in the columns
that participate in the query. We formally define the data model, the query
language and the algorithms for query processing in column-stored data. The
concepts of repetition context and semi-flattening are introduced here and play a
central role in understanding this class of queries and their algorithms.