Acoustic Modelling with CD-CTC-SMBR LSTM RNNS
Venue
ASRU (2015) (to appear)
Publication Year
2015
Authors
Andrew Senior, Hasim Sak, Felix de Chaumont Quitry, Tara N. Sainath, Kanishka Rao
BibTeX
Abstract
This paper describes a series of experiments to extend the application of
Context-Dependent (CD) long short-term memory (LSTM) recurrent neural networks
(RNNs) trained with Connectionist Temporal Classification (CTC) and sMBR loss. Our
experiments, on a noisy, reverberant voice search task, include training with
alternative pronunciations and the application to child speech recognition;
combination of multiple models, and convolutional input layers. We also investigate
the latency of CTC models and show that constraining forward-backward alignment in
training can reduce the delay for a real-time streaming speech recognition system.
Finally we investigate transferring knowledge from one network to another through
alignments
