Selection and Combination of Hypotheses for Dialectal Speech Recognition
Venue
ICASSP 2016
Publication Year
2016
Authors
Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro J. Moreno
BibTeX
Abstract
While research has often shown that building dialect-specific Automatic Speech
Recognizers is the optimal approach to dealing with dialectal variations of the
same language, we have observed that dialect-specific recognizers do not always
output the best recognitions. Often enough, another dialectal recognizer outputs a
better recognition than the dialect-specific one. In this paper, we present two
methods to select and combine the best decoded hypothesis from a pool of dialectal
recognizers. We follow a Machine Learning approach and extract features from the
Speech Recognition output along with Word Embeddings and use Shallow Neural
Networks for classification. Our experiments using Dictation and Voice Search data
from the main four Arabic dialects show good WER improvements for the hypothesis
selection scheme, reducing the WER by 2.1 to 12.1% depending on the test set, and
promising results for the hypotheses combination scheme.
