Fast Algorithms for Finding Extremal Sets
Abstract
Identifying
the
extremal
(minimal
and
maximal)
sets
from
a
collection of sets is an important subproblem in the areas
of data-mining and satisfiability checking. For example,
extremal set finding algorithms are used in the context
of mining maximal frequent itemsets, and for simplifying
large propositional satisfiability instances derived from real
world tasks such as circuit routing and verification. In
this paper, we describe two new algorithms for the task
and detail their performance on real and synthetic data.
Each algorithm leverages an entirely different principle –
one primarily exploits set cardinality constraints, the other
lexicographic constraints. Despite the inherent difficulty
of this problem (the best known worst-case bounds are
nearly quadratic), we show that both these algorithms
provide excellent performance in practice, and can identify
all extremal sets from multi-gigabyte itemset data using only
a single processor core. Both algorithms are concise and can
be implemented in no more than a few hundred lines of code.
Our reference C++ implementations are open source and
available for download.