Autoregressive Product of Multi-frame Predictions Can Improve the Accuracy of Hybrid Models
Venue
Proceedings of Interspeech 2014
Publication Year
2014
Authors
Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton
BibTeX
Abstract
We describe a simple but effective way of using multi-frame targets to improve the
accuracy of Artificial Neural Network- Hidden Markov Model (ANN-HMM) hybrid
systems. In this approach a Deep Neural Network (DNN) is trained to predict the
forced-alignment state of multiple frames using a separate softmax unit for each of
the frames. This is in contrast to the usual method of training a DNN to predict
only the state of the central frame. By itself this is not sufficient to improve
accuracy of the system significantly. However, if we average the predic- tions for
each frame - from the different contexts it is associated with - we achieve state
of the art results on TIMIT using a fully connected Deep Neural Network without
convolutional archi- tectures or dropout training. On a 14 hour subset of Wall
Street Journal (WSJ) using a context dependent DNN-HMM system it leads to a
relative improvement of 6.4% on the dev set (test- dev93) and 9.3% on test set
(test-eval92).
