Recurrent Neural Networks for Noise Reduction in Robust ASR
Venue
INTERSPEECH (2012)
Publication Year
2012
Authors
Andrew Maas, Quoc V. Le, Tyler M. O’Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng
BibTeX
Abstract
Recent work on deep neural networks as acoustic models for automatic speech
recognition (ASR) have demonstrated substantial performance improvements. We
introduce a model which uses a deep recurrent auto encoder neural network to
denoise input features for robust ASR. The model is trained on stereo (noisy and
clean) audio features to predict clean features given noisy input. The model makes
no assumptions about how noise affects the signal, nor the existence of distinct
noise environments. Instead, the model can learn to model any type of distortion or
additive noise given sufficient training data. We demonstrate the model is
competitive with existing feature denoising approaches on the Aurora2 task, and
outperforms a tandem approach where deep networks are used to predict phoneme
posteriors directly.