Distributed Discriminative Language Models for Google Voice Search
Venue
Proceedings of ICASSP 2012, IEEE, pp. 5017-5021
Publication Year
2012
Authors
Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope
BibTeX
Abstract
This paper considers large-scale linear discriminative language models trained
using a distributed perceptron algorithm. The algorithm is implemented efficiently
using a MapReduce/SSTable framework. This work also introduces the use of large
amounts of unsupervised data (confidence filtered Google voice-search logs) in
conjunction with a novel training procedure that regenerates word lattices for the
given data with a weaker acoustic model than the one used to generate the
unsupervised transcriptions for the logged data. We observe small but statistically
significant improvements in recognition performance after reranking N-best lists of
a standard Google voice-search data set.
