Investigations on Exemplar-Based Features for Speech Recognition Towards Thousands of Hours of Unsupervised, Noisy Data
Venue
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Kyoto, Japan (2012), pp. 4437-4440
Publication Year
2012
Authors
Georg Heigold, Patrick Nguyen, Mitchel Weintraub, Vincent Vanhoucke
BibTeX
Abstract
The acoustic models in state-of-the-art speech recognition systems are based on
phones in context that are represented by hidden Markov models. This modeling
approach may be limited in that it is hard to incorporate long-span acoustic
context. Exemplar-based approaches are an attractive alternative, in particular if
massive data and computational power are available. Yet, most of the data at Google
are unsupervised and noisy. This paper investigates an exemplar-based approach
under this yet not well understood data regime. A log-linear rescoring framework is
used to combine the exemplar-based features on the word level with the first-pass
model. This approach guarantees at least baseline performance and focuses on the
refined modeling of words with sufficient data. Experimental results for the Voice
Search and the YouTube tasks are presented.
