Investigations on Exemplar-Based Features for Speech Recognition Towards Thousands of
Hours of Unsupervised, Noisy Data
Abstract: The acoustic models in state-of-the-art speech recognition
systems are based on phones in context that are represented by hidden Markov models.
This modeling approach may be limited in that it is hard to incorporate long-span
acoustic context. Exemplar-based approaches are an attractive alternative, in
particular if massive data and computational power are available. Yet, most of the data
at Google are unsupervised and noisy. This paper investigates an exemplar-based
approach under this yet not well understood data regime. A log-linear rescoring
framework is used to combine the exemplar-based features on the word level with the
first-pass model. This approach guarantees at least baseline performance and focuses on
the refined modeling of words with sufficient data. Experimental results for the Voice
Search and the YouTube tasks are presented.