Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition

Tara Sainath

Ron J. Weiss

Kevin Wilson

Bo Li

Arun Narayanan

Ehsan Variani

Michiel Bacchiani

Izhak Shafran

Andrew Senior

Kean Chin

Ananya Misra

Chanwoo Kim

IEEE /ACM Transactions on Audio, Speech, and Language Processing, 25 (2017), pp. 965 - 979

Download Google Scholar

Abstract

Multichannel ASR systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. In this paper, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different microphones to filter energy arriving from different directions, we explore modeling the raw time-domain waveform directly. We introduce a neural network architecture which performs multichannel filtering in the first layer of the network and show that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target speaker direction.
%
Next, we show how performance can be improved by \emph{factoring} the first layer to separate the multichannel spatial filtering operation from a single channel filterbank which computes a frequency decomposition.
%
We also introduce an adaptive variant, which updates the spatial filter coefficients at each time frame based on the previous inputs.
%
Finally we demonstrate that these approaches can be implemented more efficiently in the frequency domain. Overall, we find that such multichannel neural networks give a relative word error rate improvement of more than 5\% compared to a traditional beamforming-based multichannel ASR system and more than 10\% compared to a single channel waveform model.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition

Abstract

Research Areas

Learn more about how we conduct our research