Query Language Modeling for Voice Search
Venue
Proceedings of the 2010 IEEE Workshop on Spoken Language Technology, IEEE, pp. 127-132
Publication Year
2010
Authors
Ciprian Chelba, Johan Schalkwyk, Thorsten Brants, Vida Ha, Boulos Harb, Will Neveitt, Carolina Parada, Peng Xu
BibTeX
Abstract
The paper presents an empirical exploration of google.com query stream language
modeling. We describe the normalization of the typed query stream resulting in
out-of-vocabulary (OoV) rates below 1% for a one million word vocabulary. We
present a comprehensive set of experiments that guided the design decisions for a
voice search service. In the process we re-discovered a less known interaction
between Kneser-Ney smoothing and entropy pruning, and found empirical evidence that
hints at non-stationarity of the query stream, as well as strong dependence on
various English locales---USA, Britain and Australia.
