Diversity maximization under matroid constraints
Venue
KDD, ACM SIGKDD (2013), pp. 32-40
Publication Year
2013
Authors
Zeinab Abbassi, Vahab Mirrokni, Mayur Thakur
BibTeX
Abstract
Aggregator websites typically present documents in the form of representative
clusters. In order for users to get a broader perspective, it is important to
deliver a diversified set of representative documents in those clusters. One
approach to diversification is to maximize the average dissimilarity among
documents. Another way to capture diversity is to avoid showing several documents
from the same category (e.g. from the same news channel). We combine the above two
diversification concepts by modeling the latter approach as a (partition) matroid
constraint, and study diversity maximization problems under matroid constraints. We
present the first constant-factor approximation algorithm for this problem, using a
new technique. Our local search 0:5-approximation algorithm is also the first
constant-factor approximation for the max-dispersion problem under matroid
constraints. Our combinatorial proof technique for maximizing diversity under
matroid constraints uses the existence of a family of Latin squares which may also
be of independent interest. In order to apply these diversity maximization
algorithms in the context of aggregator websites and as a preprocessing step for
our diversity maximization tool, we develop greedy clustering algorithms that
maximize weighted coverage of a predefined set of topics. Our algorithms are based
on computing a set of cluster centers, where clusters are formed around them. We
show the better performance of our algorithms for diversity and coverage
maximization by running experiments on real (Twitter) and synthetic data in the
context of real-time search over micro-posts. Finally we perform a user study
validating our algorithms and diversity metrics.
