Language Modeling for Voice Search
Abstract: The paper presents an empirical exploration of google.com
query stream language modeling. We describe the normalization of the typed query stream
resulting in out-of-vocabulary (OoV) rates below 1% for a one million word vocabulary.
We present a comprehensive set of experiments that guided the design decisions for a
voice search service. In the process we re-discovered a less known interaction between
Kneser-Ney smoothing and entropy pruning, and found empirical evidence that hints at
non-stationarity of the query stream, as well as strong dependence on various English
locales---USA, Britain and Australia.