A subband-based stationary-component suppression method using harmanics and power ratio for reverberant speech recognition
Abstract
This letter describes a preprocessing method called
subband-based stationary-component suppression method using
harmonics and power ratio (SHARP) processing for reverberant
speech recognition. SHARP processing extends a previous
algorithm called Suppression of Slowly varying components and
the Falling edge (SSF), which suppresses the steady-state portions
of subband spectral envelopes. The SSF algorithm tends
to over-subtract these envelopes in highly reverberant environments
when there are high levels of power in previous analysis
frames. The proposed SHARP method prevents excessive suppression
both by boosting the floor value using the harmonics in voiced
speech segments and by inhibiting the subtraction for unvoiced
speech by detecting frames in which power is concentrated in
high-frequency channels. These modifications enable the SHARP
algorithm to improve recognition accuracy by further reducing
the mismatch between power contours of clean and reverberated
speech. Experimental results indicate that the SHARP method
provides better recognition accuracy in highly reverberant environments
compared to the SSF algorithm. It is also shown that
the performance of the SHARP method can be further improved
by combining it with feature-space maximum likelihood linear
regression (fMLLR).