The paper presents an empirical exploration of google.com query stream language
modeling. We describe the normalization of the typed query stream resulting in
out-of-vocabulary (OoV) rates below 1% for a one million word vocabulary. We
present a comprehensive set of experiments that guided the design decisions for a
voice search service. In the process we re-discovered a less known interaction
between Kneser-Ney smoothing and entropy pruning, and found empirical evidence that
hints at non-stationarity of the query stream, as well as strong dependence on
various English locales---USA, Britain and Australia.