Discriminative Keyword Spotting
Venue
Speech Communication (2009), pp. 317-329
Publication Year
2009
Authors
Joseph Keshet, David Grangier, Samy Bengio
BibTeX
Abstract
This paper proposes a new approach for keyword spotting, which is based on large
margin and kernel methods rather than on HMMs. Unlike previous approaches, the
proposed method employs a discriminative learning procedure, in which the learning
phase aims at achieving a high area under the ROC curve, as this quantity is the
most common measure to evaluate keyword spotters. The keyword spotter we devise is
based on mapping the input acoustic representation of the speech utterance along
with the target keyword into a vector space. Building on techniques used for large
margin and kernel methods for predicting whole sequences, our keyword spotter
distills to a classifier in this vector-space, which separates speech utterances in
which the keyword is uttered from speech utterances in which the keyword is not
uttered. We describe a simple iterative algorithm for training the keyword spotter
and discuss its formal properties, showing theoretically that it attains high area
under the ROC curve. Experiments on read speech with the TIMIT corpus show that the
resulted discriminative system outperforms the conventional context-independent
HMM-based system. Further experiments using the TIMIT trained model, but tested on
both read (HTIMIT, WSJ) and spontaneous speech (OGI-Stories), show that without
further training or adaptation to the new corpus our discriminative system
outperforms the conventional context-independent HMM-based system.
