A subband-based stationary-component suppression method using harmanics and power ratio for reverberant speech recognition

Byung Joon Cho
Haeyong Kwon
Ji-Won Cho
Chanwoo Kim
Richard M. Stern
Hyung-Min Park
IEEE SIGNAL PROCESSING LETTERS, 23 (2016), pp. 780-784

Abstract

This letter describes a preprocessing method called
subband-based stationary-component suppression method using
harmonics and power ratio (SHARP) processing for reverberant
speech recognition. SHARP processing extends a previous
algorithm called Suppression of Slowly varying components and
the Falling edge (SSF), which suppresses the steady-state portions
of subband spectral envelopes. The SSF algorithm tends
to over-subtract these envelopes in highly reverberant environments
when there are high levels of power in previous analysis
frames. The proposed SHARP method prevents excessive suppression
both by boosting the floor value using the harmonics in voiced
speech segments and by inhibiting the subtraction for unvoiced
speech by detecting frames in which power is concentrated in
high-frequency channels. These modifications enable the SHARP
algorithm to improve recognition accuracy by further reducing
the mismatch between power contours of clean and reverberated
speech. Experimental results indicate that the SHARP method
provides better recognition accuracy in highly reverberant environments
compared to the SSF algorithm. It is also shown that
the performance of the SHARP method can be further improved
by combining it with feature-space maximum likelihood linear
regression (fMLLR).

Research Areas