Reducing the Sampling Complexity of Topic Models

Aaron Li

Amr Ahmed

Sujith Ravi

Alexander J Smola

ACM Conference on Knowledge Discovery and Data Mining (KDD) (2014)

Download Google Scholar

Abstract

Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately, the generative model loses sparsity with the increase in data, requiring O(k) operations per word for k latent states, such as topics. In this paper we propose an algorithm which requires only O(kd) operations per word, where kd is the number of actually instantiated topics in the document. For large document collections and structured hierarchical models kd ≪ k, thus yielding an order of magnitude speedup. Our method is general and it applies to a wide variety of statistical models. At its core is the idea that dense, rapidly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, judicious use of sparsity, and amortized preprocessing via the alias method.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reducing the Sampling Complexity of Topic Models

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Reducing the Sampling Complexity of Topic Models

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities