We describe the structure and application of an acoustic room simulator to generate
large-scale simulated data for training deep neural networks for far-field speech
recognition. The system simulates millions of different room dimensions, a wide
distribution of reverberation time and signal-to-noise ratios, and a range of
microphone and sound source locations. We start with a relatively clean training
set as the source and artificially create simulated data by randomly sampling a
noise configuration for every new training example. As a result, the acoustic model
is trained using examples that are virtually never repeated. We evaluate
performance of this approach based on room simulation using a factored complex Fast
Fourier Transform (CFFT) acoustic model introduced in our earlier work, which uses
CFFT layers and LSTM AMs for joint multichannel processing and acoustic modeling.
Results show that the simulator-driven approach is quite effective in obtaining
large improvements not only in simulated test conditions, but also in real /
rerecorded conditions. This room simulation system has been employed in training
acoustic models including the ones for the recently released Google Home.