Recurrent Neural Networks for Voice Activity Detection
Venue
ICASSP, IEEE (2013), pp. 7378-7382
Publication Year
2013
Authors
Thad Hughes, Keir Mierle
BibTeX
Abstract
We present a novel recurrent neural network (RNN) model for voice activity
detection. Our multi-layer RNN model, in which nodes compute quadratic polynomials,
outperforms a much larger baseline system composed of Gaussian mixture models
(GMMs) and a hand-tuned state machine (SM) for temporal smoothing. All parameters
of our RNN model are optimized together, so that it properly weights its preference
for temporal continuity against the acoustic features in each frame. Our RNN uses
one tenth the parameters and outperforms the GMM+SM baseline system by 26%
reduction in false alarms, reducing overall speech recognition computation time by
17% while reducing word error rate by 1% relative.
