MedLDA: Maximum Margin Supervised Topic Models
Abstract
A supervised topic model can utilize side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical
representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular
and potentially powerful max-margin principle unexploited for seeking predictive representations of data and more discriminative topic bases for the corpus. In this paper, we
propose the maximum entropy discrimination latent Dirichlet allocation (MedLDA) model,
which integrates the mechanism behind the max-margin prediction models (e.g., SVMs)
with the mechanism behind the hierarchical Bayesian topic models (e.g., LDA) under a uni-
fied constrained optimization framework, and yields latent topical representations that are
more discriminative and more suitable for prediction tasks such as document classification
or regression. The principle underlying the MedLDA formalism is quite general and can be
applied for jointly max-margin and maximum likelihood learning of directed or undirected
topic models when supervising side information is available. Efficient variational methods
for posterior inference and parameter estimation are derived and extensive empirical studies
on several real data sets are also provided. Our experimental results demonstrate qualitatively and quantitatively that MedLDA could: 1) discover sparse and highly discriminative
topical representations; 2) achieve state of the art prediction performance;