Distributed Acoustic Modeling with Back-off N-grams
Venue
Proceedings of ICASSP 2012, IEEE, pp. 4129-4132
Publication Year
2012
Authors
Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson
BibTeX
Abstract
The paper proposes an approach to acoustic modeling that borrows from n-gram
language modeling in an attempt to scale up both the amount of training data and
model size (as measured by the number of parameters in the model) to approximately
100 times larger than current sizes used in ASR. Dealing with unseen phonetic
contexts is accomplished using the familiar back-off technique used in language
modeling due to implementation simplicity. The new acoustic model is estimated and
stored using the MapReduce distributed computing infrastructure. Speech recognition
experiments are carried out in an Nbest rescoring framework for Google Voice
Search. 87,000 hours of training data is obtained in an unsupervised fashion by
filtering utterances in Voice Search logs on ASR confidence. The resulting models are
trained using maximum likelihood and contain 20-40 million Gaussians. They achieve
relative reductions in WER of 11% and 6% over first-pass models trained using
maximum likelihood, and boosted MMI, respectively.
