MedLDA: Maximum Margin Supervised Topic Models
Venue
Journal of Machine Learning Research (2012) (to appear)
Publication Year
2012
Authors
Jun Zhu, Amr Ahmed, Eric P. Xing
BibTeX
Abstract
A supervised topic model can utilize side information such as ratings or labels
associated with documents or images to discover more predictive low dimensional
topical representations of the data. However, existing supervised topic models
predominantly employ likelihood-driven objective functions for learning and
inference, leaving the popular and potentially powerful max-margin principle
unexploited for seeking predictive representations of data and more discriminative
topic bases for the corpus. In this paper, we propose the maximum entropy
discrimination latent Dirichlet allocation (MedLDA) model, which integrates the
mechanism behind the max-margin prediction models (e.g., SVMs) with the mechanism
behind the hierarchical Bayesian topic models (e.g., LDA) under a uni- fied
constrained optimization framework, and yields latent topical representations that
are more discriminative and more suitable for prediction tasks such as document
classification or regression. The principle underlying the MedLDA formalism is quite
general and can be applied for jointly max-margin and maximum likelihood learning
of directed or undirected topic models when supervising side information is
available. Efficient variational methods for posterior inference and parameter
estimation are derived and extensive empirical studies on several real data sets
are also provided. Our experimental results demonstrate qualitatively and
quantitatively that MedLDA could: 1) discover sparse and highly discriminative
topical representations; 2) achieve state of the art prediction performance;
