Sparse Non-negative Matrix Language Modeling for Geo-annotated Query Session Data
Venue
Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Proceedings, IEEE, to appear (to appear)
Publication Year
2015
Authors
Ciprian Chelba, Noam M. Shazeer
BibTeX
Abstract
The paper investigates the impact on query language modeling when using skip-grams
within query as well as across queries in a given search session, in conjunction
with the geo-annotation available for the query stream data. As modeling tool we
use the recently proposed sparse non-negative matrix estimation technique, since it
offers the same expressive power as the well-established maximum entropy approach
in combining arbitrary context features. Experiments on the google.com query stream
show that using session-level and geo-location context we can expect reductions in
perplexity of 34% relative over the Kneser Ney N-gram baseline; when evaluating on
the `''local'' subset of the query stream, the relative reduction in PPL is
51%---more than a bit. Both sources of context information (geo-location, and
previous queries in session) are about equally valuable in building a language
model for the query stream.
