Large Vocabulary Automatic Speech Recognition for Children
Venue
Interspeech (2015)
Publication Year
2015
Authors
Hank Liao, Golan Pundak, Olivier Siohan, Melissa Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew Senior, Françoise Beaufays, Michiel Bacchiani
BibTeX
Abstract
Recently, Google launched YouTube Kids, a mobile application for children, that
uses a speech recognizer built specifically for recognizing children’s speech. In
this paper we present techniques we explored to build such a system. We describe
the use of a neural network classifier to identify matched acoustic training data,
filtering data for language modeling to reduce the chance of producing offensive
results. We also compare long short-term memory (LSTM) recurrent networks to
convolutional, LSTM, deep neural networks (CLDNN). We found that a CLDNN acoustic
model outperforms an LSTM across a variety of different conditions, but does not
specifically model child speech relatively better than adult. Overall, these
findings allow us to build a successful, state-of-the-art large vocabulary speech
recognizer for both children and adults.
