Smoothed marginal distribution constraints for language modeling
Venue
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013), pp. 43-52
Publication Year
2013
Authors
Brian Roark, Cyril Allauzen, Michael Riley
BibTeX
Abstract
We present an algorithm for re-estimating parameters of backoff n-gram language
models so as to preserve given marginal distributions, along the lines of
well-known Kneser-Ney smoothing. Unlike Kneser-Ney, our approach is designed to be
applied to any given smoothed backoff model, including models that have already
been heavily pruned. As a result, the algorithm avoids issues observed when pruning
Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the
benefits of such marginal distribution constraints. We present experimental results
for heavily pruned backoff n-gram models, and demonstrate perplexity and word error
rate reductions when used with various baseline smoothing methods. An open-source
version of the algorithm has been released as part of the OpenGrm ngram library.
