Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition
Venue
Proceedings of Interspeech 2012
Publication Year
2012
Authors
Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke
BibTeX
Abstract
The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led
to a resurgence in the use of Artificial Neural Network - Hidden Markov Model
(ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we
report results of a DBN-pretrained context-dependent ANN/HMM system trained on two
datasets that are much larger than any reported previously with DBN-pretrained
ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of YouTube data. On the
first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture
Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by
3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline
by 4.7% absolute. Maximum Mutual Information (MMI) fine tuning and model combination
using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and
0.4% on the first dataset and 0.5% and 0.9% absolute on the second dataset.
