Robust Speech Recognition Based on Binaural Auditory Processing
INTERSPEECH 2017 (2017) (to appear)
Anjali Menon, Chanwoo Kim, Richard M. Stern
This paper discusses a combination of techniques for improving speech recognition accuracy in the presence of reverberation and spatially-separated interfering sound sources. Interaural Time Delay (ITD), observed as a consequence of the difference in arrival times of a sound to the two ears, is an important feature used by the human auditory system to reliably localize and separate sound sources. In addition, the “precedence effect” helps the auditory system differentiate between the direct sound and its subsequent reflections in reverberant environments. This paper uses a cross-correlation-based measure across the two channels of a binaural signal to isolate the target source by rejecting portions of the signal corresponding to larger ITDs. To overcome the effects of reverberation, the steady state components of speech are suppressed, effectively boosting the onsets, so as to retain the direct sound and suppress the reflections. Experimental results show a significant improvement in recognition accuracy using both these techniques. Cross-correlation-based processing and steady-state suppression are carried out separately, and the order in which these techniques are applied produces differences in the resulting recognition accuracy.