A Computationally Efficient Algorithm for Learning Topical Collocation Models
Venue
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Beijing, China (2015), pp. 1460-1469
Publication Year
2015
Authors
Zhendong Zhao, Lan Du, Benjamin Borschinger, John K Pate, Massimiliano Ciaramita, Mark Steedman, Mark Johnson
BibTeX
Abstract
Most existing topic models make the bagof-words assumption that words are generated
independently, and so ignore potentially useful information about word order.
Previous attempts to use collocations (short sequences of adjacent words) in topic
models have either relied on a pipeline approach, restricted attention to bigrams,
or resulted in models whose inference does not scale to large corpora. This paper
studies how to simultaneously learn both collocations and their topic assignments.
We present an efficient reformulation of the Adaptor Grammar-based topical
collocation model (AG-colloc) (Johnson, 2010), and develop a point-wise sampling
algorithm for posterior inference in this new formulation. We further improve the
efficiency of the sampling algorithm by exploiting sparsity and parallelising
inference. Experimental results derived in text classification, information
retrieval and human evaluation tasks across a range of datasets show that this
reformulation scales to hundreds of thousands of documents while maintaining the
good performance of the AG-colloc model.
