Contextual prediction models for speech recognition
Venue
Proceedings of Interspeech 2016 (to appear)
Publication Year
2016
Authors
Yoni Halpern, Keith Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Baeuml
BibTeX
Abstract
We introduce an approach to biasing language models towards known contexts without
requiring separate language models or explicit contextually-dependent conditioning
contexts. We do so by presenting an alternative ASR objective, where we predict the
acoustics and words given the contextual cue, such as the geographic location of
the speaker. A simple factoring of the model results in an additional biasing term,
which effectively indicates how correlated a hypothesis is with the contextual cue
(e.g., given the hypothesized transcript, how likely is the user’s known location).
We demonstrate that this factorization allows us to train relatively small
contextual models which are effective in speech recognition. An experimental
analysis shows both a perplexity reduction and a significant word error rate
reductions on a voice search task when using the user’s location as a contextual
cue.
