Publication Data
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
Abstract: A critical component of a speech recognition system
targeting web search is the language model. The talk presents an empirical exploration
of the google.com query stream with the end goal of high quality statistical language
modeling for mobile voice search. Our experiments show that after text normalization
the query stream is not as ``wild'' as it seems at first sight. One can achieve
out-of-vocabulary rates below 1% using a one million word vocabulary, and excellent
n-gram hit ratios of 77/88% even at high orders such as n=5/4, respectively. Using
large scale, distributed language models can improve performance significantly---up to
10\% relative reductions in word-error-rate over conventional models used in speech
recognition. We also find that the query stream is non-stationary, which means that
adding more past training data beyond a certain point provides diminishing returns, and
may even degrade performance slightly. Perhaps less surprisingly, we have shown that
locale matters significantly for English query data across USA, Great Britain and
Australia. In an attempt to leverage the speech data in voice search logs, we
successfully build large-scale discriminative N-gram language models and derive small
but significant gains in recognition performance.
