Publication Data
Discriminative Keyword Spotting
Abstract: This paper proposes a new approach for keyword spotting,
which is based on large margin and kernel methods rather than on HMMs. Unlike previous
approaches, the proposed method employs a discriminative learning procedure, in which
the learning phase aims at achieving a high area under the ROC curve, as this quantity
is the most common measure to evaluate keyword spotters. The keyword spotter we devise
is based on mapping the input acoustic representation of the speech utterance along
with the target keyword into a vector space. Building on techniques used for large
margin and kernel methods for predicting whole sequences, our keyword spotter distills
to a classifier in this vector-space, which separates speech utterances in which the
keyword is uttered from speech utterances in which the keyword is not uttered. We
describe a simple iterative algorithm for training the keyword spotter and discuss its
formal properties, showing theoretically that it attains high area under the ROC curve.
Experiments on read speech with the TIMIT corpus show that the resulted discriminative
system outperforms the conventional context-independent HMM-based system. Further
experiments using the TIMIT trained model, but tested on both read (HTIMIT, WSJ) and
spontaneous speech (OGI-Stories), show that without further training or adaptation to
the new corpus our discriminative system outperforms the conventional
context-independent HMM-based system.
