Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks
Venue
Interspeech (2014)
Publication Year
2014
Authors
Hasim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao
BibTeX
Abstract
We recently showed that Long Short-Term Memory (LSTM) recurrent neural networks
(RNNs) outperform state-of-the-art deep neural networks (DNNs) for large scale
acoustic modeling where the models were trained with the cross-entropy (CE)
criterion. It has also been shown that sequence discriminative training of DNNs
initially trained with the CE criterion gives significant improvements. In this
paper, we investigate sequence discriminative training of LSTM RNNs in a large
scale acoustic modeling task. We train the models in a distributed manner using
asynchronous stochastic gradient descent optimization technique. We compare two
sequence discriminative criteria -- maximum mutual information and state-level
minimum Bayes risk, and we investigate a number of variations of the basic training
strategy to better understand issues raised by both the sequential model, and the
objective function. We obtain significant gains over the CE trained LSTM RNN model
using sequence discriminative training techniques.
