Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models
Venue
Proc Interspeech (2010)
Publication Year
2010
Authors
Francoise Beaufays, Vincent Vanhoucke, Brian Strope
BibTeX
Abstract
One of the difficult problems of acoustic modeling for Automatic Speech Recognition
(ASR) is how to adequately model the wide variety of acoustic conditions which may
be present in the data. The problem is especially acute for tasks such as Google
Search by Voice, where the amount of speech available per transaction is small, and
adaptation techniques start showing their limitations. As training data from a very
large user population is available however, it is possible to identify and jointly
model subsets of the data with similar acoustic qualities. We describe a technique
which allows us to perform this modeling at scale on large amounts of data by
learning a treestructured partition of the acoustic space, and we demonstrate that
we can significantly improve recognition accuracy in various conditions through
unsupervised Maximum Mutual Information (MMI) training. Being fully unsupervised,
this technique scales easily to increasing numbers of conditions.
