Fast Algorithms for Finding Extremal Sets
Identifying the extremal (minimal and maximal) sets from a collection of sets is an important subproblem in the areas of data-mining and satisﬁability checking. For example, extremal set ﬁnding algorithms are used in the context of mining maximal frequent itemsets, and for simplifying large propositional satisﬁability instances derived from real world tasks such as circuit routing and veriﬁcation. In this paper, we describe two new algorithms for the task and detail their performance on real and synthetic data. Each algorithm leverages an entirely diﬀerent principle – one primarily exploits set cardinality constraints, the other lexicographic constraints. Despite the inherent diﬃculty of this problem (the best known worst-case bounds are nearly quadratic), we show that both these algorithms provide excellent performance in practice, and can identify all extremal sets from multi-gigabyte itemset data using only a single processor core. Both algorithms are concise and can be implemented in no more than a few hundred lines of code. Our reference C++ implementations are open source and available for download.