
Google's speech team conducts research in both speech recognition and text-to-speech conversion. Our research is motivated by the needs of our users, and the needs we see in the products the team has deployed. In this section you will find examples of papers tackling real world problems in acoustic modeling, new search algorithms, multi-linguality, language identification, speech systems architectures, data collection tools, and more.
“Distributed Acoustic Modeling with Back-Off N-grams”, Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson, Proceedings of ICASSP 2012 (to appear).
[abstract] [pdf] [search]
“Distributed Discriminative Language Models for Google Voice-Search”, Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope, Proceedings of ICASSP 2012 (to appear).
[abstract] [pdf] [search]
“Google's Cross-Dialect Arabic Voice Search”, Fadi Biadsy, Pedro J. Moreno, Martin Jansche, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 4441-4444.
[abstract] [icassp2012.com] [pdf] [search]
“Investigations on Exemplar-Based Features for Speech Recognition Towards Thousands of Hours of Unsupervised, Noisy Data”, Georg Heigold, Patrick Nguyen, Mitchel Weintraub, Vincent Vanhoucke, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012, pp. 4437-4440.
[abstract] [search]
“Japanese and Korean Voice Search”, Mike Schuster, Kaisuke Nakajima, International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 5149-5152.
[pdf] [search]
“Learning improved linear transforms for speech recognition”, Andrew Senior, Youngmin Cho, Jason Weston, ICASSP, 2012.
[abstract] [pdf] [search]
“Music Models for Music-Speech Separation”, Thad Hughes, Trausti Kristjansson, ICASSP, 2012, pp. 4917-4920.
[pdf] [search]
“Recognition of Multilingual Speech in Mobile Applications”, Hui Lin, Jui-Ting Huang, Francoise Beaufays, Brian Strope, Yun-hsuan Sung, ICASSP, 2012.
[abstract] [pdf] [search]
“Semi-supervised Discriminative Language Modeling for Turkish ASR”, Murat Saraçlar, Daniel M. Bikel, Keith Hall, Kenji Sagae, 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (to appear).
[search]
“A Web-Based Tool for Developing Multilingual Pronunciation Lexicons”, Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa, 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 3331-3332.
[abstract] [isca-speech.org] [pdf] [search]
“Bayesian Language Model Interpolation for Mobile Speech Input”, Cyril Allauzen, Michael Riley, Interspeech 2011, pp. 1429-1432.
[abstract] [pdf] [search]
“Deploying Google Search by Voice in Cantonese”, Yun-hsuan Sung, Martin Jansche, Pedro Moreno, 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 2865-2868.
[abstract] [isca-speech.org] [pdf] [search]
“Improving the speed of neural networks on CPUs”, Vincent Vanhoucke, Andrew Senior, Mark Z. Mao, Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.
[abstract] [pdf] [search]
“Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice”, Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Michael Riley, Peng Xu, Thorsten Brants, Vida Ha, Will Neveitt, 2011.
[abstract] [seminar.csee.ogi.edu] [pdf] [search]
“Recognizing English Queries in Mandarin Voice Search”, Hung-An Chang, Yun-hsuan Sung, Brian Strope, Francoise Beaufays, ICASSP, 2011.
[pdf] [search]
“Speech Retrieval”, Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran, Murat Saraçlar, Spoken Language Understanding, 2011, pp. 417-446.
[onlinelibrary.wiley.com] [search]
“Challenges in Automatic Speech Recognition”, Ciprian Chelba, Johan Schalkwyk, Michiel Bacchiani, Interspeech 2010.
[abstract] [pdf] [search]
“Decision Tree State Clustering with Word and Syllable Features”, Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan, Interspeech, 2010, 2958 – 2961.
[abstract] [pdf] [search]
“Discriminative Topic Segmentation of Text and Speech”, Mehryar Mohri, Pedro Moreno, Eugene Weinstein, International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[jmlr.csail.mit.edu] [pdf] [search]
“Google Search by Voice: A Case Study”, Johan Schalkwyk, Doug Beeferman, Francoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Garrett, Brian Strope, Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, 2010, pp. 61-90.
[dx.doi.org] [pdf] [search]
“On-Demand Language Model Interpolation for Mobile Speech Input”, Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk, Interspeech, 2010, pp. 1812-1815.
[abstract] [pdf] [search]
“Search by Voice in Mandarin Chinese”, Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno, Interspeech 2010, pp. 354-357.
[abstract] [pdf] [search]
“Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models”, Francoise Beaufays, Vincent Vanhoucke, Brian Strope, Proc Interspeech, 2010.
[abstract] [pdf] [search]
“A new quality measure for topic segmentation of text and speech”, Mehryar Mohri, Pedro J. Moreno, Eugene Weinstein, Conference of the International Speech Communication Association (Interspeech), 2009.
[pdf] [search]
“Restoring Punctuation and Capitalization in Transcribed Speech”, Agustín Gravano, Martin Jansche, Michiel Bacchiani, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009, pp. 4741-4744.
[abstract] [doi] [pdf] [search]
“Revisiting Graphemes with Increasing Amounts of Data”, Yun-Hsuan Sung, Thad Hughes, Francoise Beaufays, Brian Strope, ICASSP, 2009.
[pdf] [search]
“Web-derived Pronunciations”, Arnab Ghoshal, Martin Jansche, Sanjeev Khudanpur, Michael Riley, Morgan Ulinski, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009, pp. 4289-4292.
[abstract] [doi] [pdf] [search]
“Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing”, Michiel Bacchiani, Francoise Beaufays, Johan Schalkwyk, Mike Schuster, Brian Strope, Proc. ICASSP, 2008.
[pdf] [search]
“Retrieval and browsing of spoken content”, Ciprian Chelba, Timothy J. Hazen, Murat Saraçlar, Signal Processing Magazine, IEEE, vol. 25 (2008), pp. 39-49.
[dx.doi.org] [search]
“Speech Recognition with Weighted Finite-State Transducers”, Mehryar Mohri, Fernando C. N. Pereira, Michael Riley, Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, 2008.
[cs.nyu.edu] [search]
“Speech Recognition with Weighted Finite-State Transducers”, Mehryar Mohri, Fernando C. N. Pereira, Michael Riley, Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, 2007.
[cs.nyu.edu] [search]