On Rectified Linear Units For Speech Processing
Venue
38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013)
Publication Year
2013
Authors
M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton
BibTeX
Abstract
Deep neural networks have recently become the gold standard for acoustic modeling
in speech recognition systems. The key computational unit of a deep network is a
linear projection followed by a point-wise non-linearity, which is typically a
logistic function. In this work, we show that we can improve generalization and
make training of deep networks faster and simpler by substituting the logistic
units with rectified linear units. These units are linear when their input is
positive and zero otherwise. In a supervised setting, we can successfully train
very deep nets from random initialization on a large vocabulary speech recognition
task achieving lower word error rates than using a logistic network with the same
topology. Similarly in an unsupervised setting, we show how we can learn sparse
features that can be useful for discriminative tasks. All our experiments are
executed in a distributed environment using several hundred machines and several
hundred hours of speech data.
