Fast Algorithms for Finding Extremal Sets
Venue
Proc. of the 2011 SIAM Int'l Conf. on Data Mining (to appear)
Publication Year
2011
Authors
Roberto J. Bayardo, Biswanath Panda
BibTeX
Abstract
Identifying the extremal (minimal and maximal) sets from a collection of sets is an
important subproblem in the areas of data-mining and satisfiability checking. For
example, extremal set finding algorithms are used in the context of mining maximal
frequent itemsets, and for simplifying large propositional satisfiability instances
derived from real world tasks such as circuit routing and verification. In this
paper, we describe two new algorithms for the task and detail their performance on
real and synthetic data. Each algorithm leverages an entirely different principle –
one primarily exploits set cardinality constraints, the other lexicographic
constraints. Despite the inherent difficulty of this problem (the best known
worst-case bounds are nearly quadratic), we show that both these algorithms provide
excellent performance in practice, and can identify all extremal sets from
multi-gigabyte itemset data using only a single processor core. Both algorithms are
concise and can be implemented in no more than a few hundred lines of code. Our
reference C++ implementations are open source and available for download.
