Google Research

Search

Distributed Discriminative Language Models for Google Voice-Search

Abstract

This paper considers large-scale linear discriminative language models trained using a distributed perceptron algorithm. The algorithm is implemented efficiently using a MapReduce/SSTable framework. This work also introduces the use of large amounts of unsupervised data (confidence filtered Google voice-search logs) in conjunction with a novel training procedure that regenerates word lattices for the given data with a weaker acoustic model than the one used to generate the unsupervised transcriptions for the logged data. We observe small but statistically significant improvements in recognition performance after reranking N-best lists of a standard Google voice-search data set.


Citation: “Distributed Discriminative Language Models for Google Voice-Search”, Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope, Proceedings of ICASSP 2012 (to appear).
[pdf] [search]

See also other publications by Googlers.

©2012 Google