Similarity-based Clustering by Left-Stochastic Matrix Factorization
Venue
Journal Machine Learning Research (JMLR), vol. 14 (2013), pp. 1715-1746
Publication Year
2013
Authors
Raman Arora, Maya R. Gupta, Amol Kapila, Maryam Fazel
BibTeX
Abstract
For similarity-based clustering, we propose modeling the entries of a given
similarity matrix as the inner products of the unknown cluster probabilities. To
estimate the cluster probabilities from the given similarity matrix, we introduce a
left-stochastic non-negative matrix factorization problem. A rotation-based
algorithm is proposed for the matrix factorization. Conditions for unique matrix
factorizations and clusterings are given, and an error bound is provided. The
algorithm is particularly efficient for the case of two clusters, which motivates a
hierarchical variant for cases where the number of desired clusters is large.
Experiments show that the proposed left-stochastic decomposition clustering model
produces relatively high within-cluster similarity on most data sets and can match
given class labels, and that the efficient hierarchical variant performs
surprisingly well.
