Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation
Venue
Google (2014)
Publication Year
2014
Authors
Noam M. Shazeer, Joris Pelemans, Ciprian Chelba
BibTeX
Abstract
We present a novel family of language model (LM) estimation techniques named Sparse
Non-negative Matrix (SNM) estimation. A first set of experiments empirically
evaluating it on the One Billion Word Benchmark shows that SNM n-gram LMs perform
almost as well as the well-established Kneser-Ney (KN) models. When using skip-gram
features the models are able to match the state-of-the-art recurrent neural network
(RNN) LMs; combining the two modeling techniques yields the best known result on
the benchmark. The computational advantages of SNM over both maximum entropy and
RNN LM estimation are probably its main strength, promising an approach that has
the same flexibility in combining arbitrary features effectively and yet should
scale to very large amounts of data as gracefully as n-gram LMs do.
