Deep Neural Networks for Acoustic Modeling in Speech Recognition
Venue
Signal Processing Magazine (2012)
Publication Year
2012
Authors
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury
BibTeX
Abstract
Most current speech recognition systems use hidden Markov models (HMMs) to deal
with the temporal variability of speech and Gaussian mixture models to determine
how well each state of each HMM fits a frame or a short window of frames of
coefficients that represents the acoustic input. An alternative way to evaluate the
fit is to use a feedforward neural network that takes several frames of coefficients
as input and produces posterior probabilities over HMM states as output. Deep
neural networks with many hidden layers, that are trained using new methods have
been shown to outperform Gaussian mixture models on a variety of speech recognition
benchmarks, sometimes by a large margin. This paper provides an overview of this
progress and represents the shared views of four research groups who have had
recent successes in using deep neural networks for acoustic modeling in speech
recognition.
