Neural Networks for Acoustic Modeling in Speech Recognition
Abstract: Most current speech recognition systems use hidden Markov
models (HMMs) to deal with the temporal variability of speech and Gaussian mixture
models to determine how well each state of each HMM ﬁts a frame or a short window of
frames of coefﬁcients that represents the acoustic input. An alternative way to
evaluate the ﬁt is to use a feedforward neural network that takes several frames of
coefﬁcients as input and produces posterior probabilities over HMM states as output.
Deep neural networks with many hidden layers, that are trained using new methods have
been shown to outperform Gaussian mixture models on a variety of speech recognition
benchmarks, sometimes by a large margin. This paper provides an overview of this
progress and represents the shared views of four research groups who have had recent
successes in using deep neural networks for acoustic modeling in speech recognition.