Speech Processing
Our goal in Speech Technology Research is twofold: to make speaking to devices around you (home, in car), devices you wear (watch), devices with you (phone, tablet) ubiquitous and seamless.
Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.
We are also in a unique position to deliver very user-centric research. Researchers have the wealth of millions of users talking to Voice Search or the Android Voice Input every day. and can conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we keep our users very close to make sure we solve real problems and have real impact.
We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 55 languages and we keep expanding our reach to more and more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.
In terms of a challenge, indexing and transcribing the web’s audio content is another challenge we have set for ourself, and is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and of course... cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The payoff is immense: imagine making every lecture on the web accessible to every language; this is the kind of impact we are striving for.
184 Publications
-
AN ACOUSTIC KEYSTROKE TRANSIENT CANCELER FOR SPEECH COMMUNICATION TERMINALS USING A SEMI-BLIND ADAPTIVE FILTER MODEL
Herbert Buchner, Simon Godsill, Jan Skoglund
ICASSP (2016)
-
BI-MAGNITUDE PROCESSING FRAMEWORK FOR NONLINEAR ACOUSTIC ECHO CANCELLATION ON ANDROID DEVICES
Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs
International Workshop on Acoustic Signal Enhancement 2016 (IWAENC2016) (to appear)
-
Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla
Alexander Gutkin, Linne Ha, Martin Jansche, Oddur Kjartansson, Knot Pipatsrisawat, Richard Sproat
SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, 09-12 May 2016, Yogyakarta, Indonesia; Procedia Computer Science, Elsevier B.V., pp. 194-200
-
Contextual prediction models for speech recognition
Yoni Halpern, Keith Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Baeuml
Proceedings of Interspeech 2016 (to appear)
-
Cross-lingual projection for class-based language models
Beat Gfeller, Vlad Schogol, Keith Hall
ACL2016
-
Directly Modeling Voiced and Unvoiced Components in Speech Waveforms by Neural Networks
Keiichi Tokuda, Heiga Zen
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2016), pp. 5640-5644
-
Distributed representation and estimation of WFST-based n-gram models
Cyril Allauzen, Michael Riley, Brian Roark
Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (StatFSM) (2016), pp. 32-41
-
End-to-End Text-Dependent Speaker Verification
Georg Heigold, Ignacio Moreno, Samy Bengio, Noam M. Shazeer
International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
-
Factored Spatial and Spectral Multichannel Raw Waveform CLDNNs
Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani
International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
-
Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak
Proc. Interspeech, San Francisco, CA, USA (2016) (to appear)
-
GLOBALLY OPTIMIZED LEAST-SQUARES POST-FILTERING FOR MICROPHONE ARRAY SPEECH ENHANCEMENT
Yiteng (Arden) Huang, Alejandro Luebs, Jan Skoglund, W. Bastiaan Kleijn
ICASSP (2016)
-
Learning Compact Recurrent Neural Networks
Zhiyun Lu, Vikas Sindhwani, Tara Sainath
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
-
Learning N-gram Language Models from Uncertain Data
Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark
Interspeech (2016)
-
Learning Personalized Pronunciations for Contact Names Recognition
Tony Bruguier, Fuchun Peng, Francoise Beaufays
Interspeech 2016 (to appear)
-
Listen, Attend and Spell: A Neural Network for Large Vocabulary Conversational Speech Recognition
William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals
ICASSP (2016)
-
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Proc. Interspeech, ISCA (2016) (to appear)
-
Proc. Interspeech, ISCA (2016) (to appear)
-
Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani
Proc. Interspeech, ISCA (2016) (to appear)
-
ON PRE-FILTERING STRATEGIES FOR THE GCC-PHAT ALGORITHM
Hong-Goo Kang, Michael Graczyk, Jan Skoglund
International Workshop on Acoustic Signal Enhancement 2016 (IWAENC 2016) (to appear)
-
Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, Ian McGraw
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
-
Personalized Speech Recognition On Mobile Devices
Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Françoise Beaufays, Carolina Parada
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
-
Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction
Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran
Proc. Interspeech, ISCA (2016) (to appear)
-
Robust Estimation of Reverberation Time Using Polynomial Roots
Ian Kelly, Francis Boland, Jan Skoglund
AES 60th Conference on Dereverberation and Reverberation of Audio, Music, and Speech, Google Ireland Ltd. (2016)
-
Selection and Combination of Hypotheses for Dialectal Speech Recognition
Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro J. Moreno
ICASSP 2016
-
ICASSP, IEEE (2016)
-
TTS for Low Resource Languages: A Bangla Synthesizer
Alexander Gutkin, Linne Ha, Martin Jansche, Knot Pipatsrisawat, Richard Sproat
10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, Portorož (Slovenia), European Language Resources Association (ELRA), Paris, France, pp. 2005-2010
-
Yannis Agiomyrgiannakis, Zoe Roupakia
ICASSP, IEEE (2016)
-
Guang Wang, Richard F. Lyon, Emmanuel M. Drakakis
IEEE Transactions on Biomedical Circuits and Systems, vol. 9 (2015), pp. 72-86
-
Ehsan Variani, Erik McDermott, Georg Heigold
ICASSP, IEEE (2015)
-
Acoustic Modeling for Speech Synthesis: from HMM to RNN
IEEE ASRU, Scottsdale, Arizona, U.S.A. (2015)
-
Acoustic Modeling in Statistical Parametric Speech Synthesis - From HMM to LSTM-RNN
Proc. MLSLP (2015)
-
Acoustic Modelling with CD-CTC-SMBR LSTM RNNS
Andrew Senior, Hasim Sak, Felix de Chaumont Quitry, Tara N. Sainath, Kanishka Rao
ASRU (2015) (to appear)
-
Rohit Prabhavalkar, Raziel Alvarez, Carolina Parada, Preetum Nakkiran, Tara Sainath
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2015), pp. 4704-4708
-
Automatic Pronunciation Verification for Speech Recognition
Kanishka Rao, Fuchun Peng, Françoise Beaufays
ICASSP (2015)
-
Bringing Contextual Information to Google Speech Recognition
Petar Aleksic, Mohammadreza Ghodsi, Assaf Michaely, Cyril Allauzen, Keith Hall, Brian Roark, David Rybach, Pedro Moreno
Interspeech 2015, International Speech Communications Association
-
Composition-based on-the-fly rescoring for salient n-gram biasing
Keith Hall, Eunjoon Cho, Cyril Allauzen, Francoise Beaufays, Noah Coccaro, Kaisuke Nakajima, Michael Riley, Brian Roark, David Rybach, Linda Zhang
Interspeech 2015, International Speech Communications Association
-
Compressing Deep Neural Networks using a Rank-Constrained Topology
Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada
Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), ISCA (2015), pp. 1473-1477
-
Context dependent phone models for LSTM RNN acoustic modelling
Andrew W. Senior, Hasim Sak, Izhak Shafran
ICASSP (2015), pp. 4585-4589
-
Convolutional Neural Networks for Small-Footprint Keyword Spotting
Interspeech (2015)
-
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
Tara Sainath, Oriol Vinyals, Andrew Senior, Hasim Sak
ICASSP (2015)
-
DETECTION AND SUPPRESSION OF KEYBOARD TRANSIENT NOISE IN AUDIO STREAMS WITH AUXILIARY KEYBED MICROPHONE
Simon Godsill, Herbert Buchner, Jan Skoglund
ICASSP 2015, IEEE
-
DIRECT-TO-REVERBERANT RATIO ESTIMATION USING A NULL-STEERED BEAMFORMER
James Eaton, Alastair Moore, Patrick Naylor, Jan Skoglund
ICASSP 2015, IEEE
-
Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, Li Deng
IEEE Signal Processing Magazine, vol. 32 (2015), pp. 35-52
-
Directly Modeling Speech Waveforms by Neural Networks for Statistical Parametric Speech Synthesis
Keiichi Tokuda, Heiga Zen
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4215-4219
-
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Hasim Sak, Andrew W. Senior, Kanishka Rao, Françoise Beaufays
CoRR, vol. abs/1507.06947 (2015)
-
Fix It Where It Fails: Pronunciation Learning by Mining Error Corrections from Speech Logs
Zhenzhen Kou, Daisy Stanton, Fuchun Peng, Françoise Beaufays, Trevor Strohman
ICASSP (2015)
-
Garbage Modeling for On-device Speech Recognition
Christophe Van Gysel, Leonid Velikovich, Ian McGraw, Françoise Beaufays
Interspeech 2015, International Speech Communications Association (to appear)
-
Geo-location for Voice Search Language Modeling
Ciprian Chelba, Xuedong Zhang, Keith Hall
Interspeech 2015, International Speech Communications Association, pp. 1438-1442
-
Grapheme-to-Phoneme Conversion Using Long Short-Term Memory Recurrent Neural Networks
Kanishka Rao, Fuchun Peng, Hasim Sak, Françoise Beaufays
ICASSP (2015)
-
Improved recognition of contact names in voice commands
Petar Aleksic, Cyril Allauzen, David Elson, Aleks Kracun, Diego Melendo Casado, Pedro J. Moreno
ICASSP 2015
-
Language Modeling in the Era of Abundant Data
Stanford Information Theory Forum (2015)
-
Large Vocabulary Automatic Speech Recognition for Children
Hank Liao, Golan Pundak, Olivier Siohan, Melissa Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew Senior, Françoise Beaufays, Michiel Bacchiani
Interspeech (2015)
-
Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR
Arun Narayanan, Ananya Misra, Kean Chin
INTERSPEECH-2015, ISCA, pp. 3571-3575
-
Learning acoustic frame labeling for speech recognition with recurrent neural networks
Hasim Sak, Andrew W. Senior, Kanishka Rao, Ozan Irsoy, Alex Graves, Françoise Beaufays, Johan Schalkwyk
ICASSP (2015), pp. 4280-4284
-
Learning the Speech Front-end with Raw Waveform CLDNNs
Tara Sainath, Ron J. Weiss, Kevin Wilson, Andrew W. Senior, Oriol Vinyals
Interspeech (2015)
-
Locally-Connected and Convolutional Neural Networks for Small Footprint Speaker Recognition
Yu-hsin Chen, Ignacio Lopez Moreno, Tara Sainath, Mirkó Visontai, Raziel Alvarez, Carolina Parada
Interspeech (2015)
-
Long Short-Term Memory Language Models with Additive Morphological Features for Automatic Speech Recognition
Daniel Renshaw, Keith B. Hall
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)
-
Multi-Dialectical Languages Effect on Speech Recognition
Mohamed Elfeky, Pedro J. Moreno, Victor Soto
International Conference on Natural Language and Speech Processing (2015)
-
Multitask learning and system combination for automatic speech recognition
Olivier Siohan, David Rybach
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
-
Pruning Sparse Non-negative Matrix N-gram Language Models
Joris Pelemans, Noam M. Shazeer, Ciprian Chelba
Proceedings of Interspeech 2015, ISCA, pp. 1433-1437
-
Query-by-Example Keyword Spotting Using Long Short-Term Memory Networks
Guoguo Chen, Carolina Parada, Tara N. Sainath
ICASSP (2015)
-
Rapid Vocabulary Addition to Context-Dependent Decoder Graphs
Interspeech 2015
-
Sequence-based Class Tagging for Robust Transcription in ASR
Lucy Vasserman, Vlad Schogol, Keith Hall
Interspeech 2015, International Speech Communications Association (to appear)
-
Sparse Non-negative Matrix Language Modeling for Geo-annotated Query Session Data
Ciprian Chelba, Noam M. Shazeer
Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Proceedings, IEEE, to appear (to appear)
-
Speaker Location and Microphone Spacing Invariant Acoustic Modeling from Raw Multichannel Waveforms
Tara N. Sainath, Ron J. Weiss, Kevin Wilson, Arun Narayanan, Michiel Bacchiani, Andrew Senior
ASRU (2015)
-
Speech Acoustic Modeling from Raw Multichannel Waveforms
Yedid Hoshen, Ron Weiss, Kevin W Wilson
International Conference on Acoustics, Speech, and Signal Processing, IEEE (2015)
-
Statistical parametric speech synthesis: from HMM to LSTM-RNN
RTTH Summer School on Speech Technology -- A Deep Learning Perspective, Barcelona, Spain (2015)
-
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4470-4474
-
ViSQOL: an objective speech quality model
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015 (13) (2015), pp. 1-18
-
Vocaine the Vocoder and Applications in Speech Synthesis
ICASSP, IEEE (2015) (to appear)
-
A big data approach to acoustic model training corpus selection
Olga Kapralova, John Alex, Eugene Weinstein, Pedro Moreno, Olivier Siohan
Conference of the International Speech Communication Association (Interspeech) (2014)
-
An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech
Alastair H Moore, Patrick A Naylor, Jan Skoglund
Proceedings of European Signal Processing Conference (EUSIPCO) 2014
-
Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks
Georg Heigold, Erik McDermott, Vincent Vanhoucke, Andrew Senior, Michiel Bacchiani
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Firenze, Italy (2014)
-
Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks: Towards Big Data
Erik McDermott, Georg Heigold, Pedro Moreno, Andrew Senior, Michiel Bacchiani
Interspeeech, ISCA (2014)
-
Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition
M. Bacchiani, A. Senior, G. Heigold
Proceedings of the European Conference on Speech Communication and Technology (2014) (to appear)
-
Automatic Language Identification Using Deep Neural Networks
Ignacio Lopez-Moreno, Javier Gonzalez-Dominguez, Oldrich Plchot
Proc. ICASSP, IEEE (2014)
-
Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks
Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Hasim Sak
Interspeech (2014)
-
Autoregressive Product of Multi-frame Predictions Can Improve the Accuracy of Hybrid Models
Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton
Proceedings of Interspeech 2014
-
Backoff Inspired Features for Maximum Entropy Language Models
Fadi Biadsy, Keith Hall, Pedro Moreno, Brian Roark
Proceedings of Interspeech, ISCA (2014)
-
Computer-aided quality assurance of an Icelandic pronunciation dictionary
LREC 2014, Reykjavik
-
Context Dependent State Tying for Speech Recognition using Deep Neural Network Acoustic Models
M. Bacchiani, D. Rybach
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)
-
Deep Mixture Density Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2014), pp. 3872-3876
-
Deep Neural Networks for Small Footprint Text-dependent Speaker Verification
Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, Javier Gonzalez-Dominguez
Proc. ICASSP, IEEE (2014)
-
Discriminative pronunciation modeling for dialectal speech recognition
Maider Lehr, Kyle Gorman, Izhak Shafran
Proc. Interspeech (2014) (to appear)
-
Encoding Linear Models As Weighted Finite-State Transducers
Ke Wu, Cyril Allauzen, Keith Hall, Michael Riley, Brian Roark
Interspeech 2014, ISCA, pp. 1258-1262
-
Fine Context, Low-rank, Softplus Deep Neural Networks for Mobile Speech Recognition
Andrew Senior, Xin Lei
Proc. ICASSP (2014) (to appear)
-
Frame by Frame Language Identification in Short Utterances using Deep Neural Networks
Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez
Neural Networks Special Issue: Neural Network Learning in Big Data (2014)
-
GMM-Free DNN Training
A. Senior, G. Heigold, M. Bacchiani, H. Liao
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)
-
Improving DNN Speaker Independence with I-vector Inputs
Andrew Senior, Ignacio Lopez-Moreno
Proc. ICASSP, IEEE (2014)
-
JustSpeak: Enabling Universal Voice Control on Android
Yu Zhong, T. V. Raman, Casey Burkhardt, Fadi Biadsy, Jeffrey P. Bigham
W4A 2014
-
Large-Scale Speaker Identification
Ludwig Schmidt, Matthew Sharifi, Ignacio Lopez-Moreno
Proc. ICASSP, IEEE (2014)
-
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
Hasim Sak, Andrew W. Senior, Françoise Beaufays
CoRR, vol. abs/1402.1128 (2014)
-
Long short-term memory recurrent neural network architectures for large scale acoustic modeling
Hasim Sak, Andrew W. Senior, Françoise Beaufays
INTERSPEECH (2014), pp. 338-342
-
Pronunciation Learning for Named-Entities through Crowd-Sourcing
Attapol Rutherford, Fuchun Peng, Françoise Beaufays
Proceedings of Interspeech (2014)
-
Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks
Hasim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao
Interspeech (2014)
-
Sinusoidal Interpolation Across Missing Data
W. Bastiaan Kleijn, Turaj Zakizadeh Shabestary, Jan Skoglund
International Workshop on Acoustic Signal Enhancement 2014 (IWAENC 2014), pp. 71-75
-
Small-Footprint Keyword Spotting using Deep Neural Networks
Guoguo Chen, Carolina Parada, Georg Heigold
ICASSP, IEEE (2014)
-
Statistical Parametric Speech Synthesis
UKSpeech Conference, Edinburgh, UK (2014)
-
Text-To-Speech with cross-lingual Neural Network-based grapheme-to-phoneme models
Xavi Gonzalvo, Monika Podsiadlo
Proceedings of Interspeech, ISCA (2014)
-
Training Data Selection Based On Context-Dependent State Matching
Proceedings of ICASSP 2014
-
Word Embeddings for Speech Recognition
Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech (2014)
-
Yannis Agiomyrgiannakis, Florian Eyben
ICASSP, IEEE (2013)
-
Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices
Xin Lei, Andrew Senior, Alexander Gruenstein, Jeffrey Sorensen
Interspeech (2013)
-
An Empirical study of learning rates in deep neural networks for speech recognition
Andrew Senior, Georg Heigold, Marc'aurelio Ranzato, Ke Yang
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013) (to appear)
-
Deep Learning in Speech Synthesis
8th ISCA Speech Synthesis Workshop, Barcelona, Spain (2013)
-
Deep Neural Networks with Auxiliary Gaussian Mixture Models for Real-Time Speech Recognition
Xin Lei, Hui Lin, Georg Heigold
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
-
Direct construction of compact context-dependency transducers from data
David Rybach, Michael Riley, Chris Alberti
Computer Speech & Language (2013) (to appear)
-
Ciprian Chelba, Johan Schalkwyk
Mobile Speech and Advanced Natural Language Solutions, Springer Science+Business Media, New York (2013), pp. 197-229
-
Language Model Verbalization for Automatic Speech Recognition
Hasim Sak, Françoise Beaufays, Kaisuke Nakajima, Cyril Allauzen
Proc ICASSP, IEEE (2013)
-
Language Modeling Capitalization
Françoise Beaufays, Brian Strope
Proc ICASSP, IEEE (2013) (to appear)
-
Large Scale Distributed Acoustic Modeling With Back-off N-grams
Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson
IEEE Transactions on Audio, Speech and Language Processing, vol. 21 (2013), pp. 1158-1169
-
Large Scale Distributed Acoustic Modeling With Back-off N-grams
Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson
ICSI, Berkeley, California (2013)
-
Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription
Hank Liao, Erik McDermott, Andrew Senior
ASRU (2013)
-
Mixture of mixture n-gram language models
Hasim Sak, Cyril Allauzen, Kaisuke Nakajima, Françoise Beaufays
ASRU (2013), pp. 31-36
-
Monitoring the Effects of Temporal Clipping on VoIP Speech Quality
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
Interspeech 2013, pp. 1188-1192
-
Multiframe Deep Neural Networks for Acoustic Modeling
Vincent Vanhoucke, Matthieu Devin, Georg Heigold
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
-
Multilingual acoustic models using distributed deep neural networks
Georg Heigold, Vincent Vanhoucke, Andrew Senior, Patrick Nguyen, Marc'aurelio Ranzato, Matthieu Devin, Jeff Dean
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
-
On Rectified Linear Units For Speech Processing
M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton
38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013)
-
Pre-Initialized Composition for Large-Vocabulary Speech Recognition
Interspeech 2013, 666 – 670
-
RAPID ADAPTATION FOR MOBILE SPEECH APPLICATIONS
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2013)
-
Rate-Distortion Optimization for Multichannel Audio Compression
Minyue Li, Jan Skoglund, W. Bastiaan Kleijn
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
-
Recurrent Neural Networks for Voice Activity Detection
Thad Hughes, Keir Mierle
ICASSP, IEEE (2013), pp. 7378-7382
-
Robustness of Speech Quality Metrics to Background Noise and Network Degradations: Comparing VISQOL, PESQ and POLQA
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 3697-3701
-
Search Results Based N-Best Hypothesis Rescoring With Maximum Entropy Classification
Fuchun Peng, Scott Roy, Ben Shahshahani, Françoise Beaufays
Proceedings of ASRU (2013)
-
Smoothed marginal distribution constraints for language modeling
Brian Roark, Cyril Allauzen, Michael Riley
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013), pp. 43-52
-
Speaker Adaptation of Context Dependent Deep Neural Networks
International Conference of Acoustics, Speech, and Signal Processing. (2013)
-
Speech and Natural Language: Where Are We Now And Where Are We Headed?
Mobile Voice Conference, San Francisco (2013)
-
Statistical Parametric Speech Synthesis Using Deep Neural Networks
Heiga Zen, Andrew Senior, Mike Schuster
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 7962-7966
-
Written-Domain Language Modeling for Automatic Speech Recognition
Hasim Sak, Yun-hsuan Sung, Françoise Beaufays, Cyril Allauzen
Interspeech (2013)
-
iVector-based Acoustic Data Selection
Olivier Siohan, Michiel Bacchiani
Proceedings of Interspeech (2013)
-
Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition
Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke
Proceedings of Interspeech 2012
-
Buildling adaptive dialogue systems via Bayes-adaptive POMDP
Shaowei Png, Joelle Pineau, B. Chaib-draa
IEEE Journal of Selected Topics in Signal Processing, vol. vol.6(8). 2012. (2012), pp. 917-927
-
Chapter 17: Uncertainty Decoding, In Virtanen, Singh, & Raj (Eds.) Techniques for Noise Robustness in Automatic Speech Recognition.
Wiley (2012), pp. 463-485
-
Continuous Space Discriminative Language Modeling
Puyang Xu, Sanjeev Khudanpur, Maider Lehr, Emily Prud’hommeaux, Nathan Glenn, Damianos Karakos, Brian Roark, Kenji Sagae, Murat Saraclar, Izhak Shafran, Dan Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley
ICASSP 2012
-
Deep Neural Networks for Acoustic Modeling in Speech Recognition
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury
Signal Processing Magazine (2012)
-
Distributed Acoustic Modeling with Back-off N-grams
Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson
Proceedings of ICASSP 2012, IEEE, pp. 4129-4132
-
Distributed Discriminative Language Models for Google Voice Search
Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope
Proceedings of ICASSP 2012, IEEE, pp. 5017-5021
-
Estimating Word-Stability During Incremental Speech Recognition
Ian McGraw, Alexander Gruenstein
Interspeech (2012)
-
Google's Cross-Dialect Arabic Voice Search
Fadi Biadsy, Pedro J. Moreno, Martin Jansche
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 4441-4444
-
Hallucinated N-Best Lists for Discriminative Language Modeling
Kenji Sagae, Maider Lehr, Emily Tucker Prud’hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Murat Saraçlar, Izhak Shafran, Daniel M. Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hassler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)
-
Haptic Voice Recognition Grand Challenge
K. Sim, S. Zhao, K. Yu, H. Liao
14th ACM International Conference on Multimodal Interaction. (2012)
-
IMPROVED PREDICTION OF NEARLY-PERIODIC SIGNALS
Bastiaan Kleijn, Jan Skoglund
International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
-
Georg Heigold, Patrick Nguyen, Mitchel Weintraub, Vincent Vanhoucke
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Kyoto, Japan (2012), pp. 4437-4440
-
Japanese and Korean Voice Search
Mike Schuster, Kaisuke Nakajima
International Conference on Acoustics, Speech and Signal Processing, IEEE (2012), pp. 5149-5152
-
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Leif Johnson, Michael Riley, Peng Xu, Preethi Jyothi, Thorsten Brants, Vida Ha, Will Neveitt
University of Toronto (2012)
-
Large Scale Language Modeling in Automatic Speech Recognition
Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar
Google (2012)
-
Large-scale Discriminative Language Model Reranking for Voice Search
Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Association for Computational Linguistics, pp. 41-49
-
Learning improved linear transforms for speech recognition
Andrew Senior, Youngmin Cho, Jason Weston
ICASSP, IEEE (2012)
-
Music Models for Music-Speech Separation
Thad Hughes, Trausti Kristjansson
ICASSP, IEEE (2012), pp. 4917-4920
-
Optimal Size, Freshness and Time-frame for Voice Search Vocabulary
Google (2012)
-
Recognition of Multilingual Speech in Mobile Applications
Hui Lin, Jui-Ting Huang, Francoise Beaufays, Brian Strope, Yun-hsuan Sung
ICASSP (2012)
-
Recurrent Neural Networks for Noise Reduction in Robust ASR
Andrew Maas, Quoc V. Le, Tyler M. O’Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng
INTERSPEECH (2012)
-
Semi-supervised Discriminative Language Modeling for Turkish ASR
Murat Saraçlar, Daniel M. Bikel, Keith Hall, Kenji Sagae
2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, IEEE, Kyoto, Japan
-
Spectral Intersections for Non-Stationary Signal Separation
Trausti Kristjansson, Thad Hughes
Proceedings of InterSpeech 2012, Portland, OR
-
Speech/Nonspeech Segmentation in Web Videos
Proceedings of InterSpeech 2012
-
VISQOL: THE VIRTUAL SPEECH QUALITY OBJECTIVE LISTENER
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
-
Voice Query Refinement
Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk
Interspeech (2012)
-
A Web-Based Tool for Developing Multilingual Pronunciation Lexicons
Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa
12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 3331-3332
-
Bayesian Language Model Interpolation for Mobile Speech Input
Interspeech 2011, pp. 1429-1432
-
Deploying Google Search by Voice in Cantonese
Yun-hsuan Sung, Martin Jansche, Pedro Moreno
12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 2865-2868
-
Discriminative Features for Language Identification
C. Alberti, M. Bacchiani
INTERSPEECH (2011)
-
Improving the speed of neural networks on CPUs
Vincent Vanhoucke, Andrew Senior, Mark Z. Mao
Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011
-
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Michael Riley, Peng Xu, Thorsten Brants, Vida Ha, Will Neveitt
OGI/OHSU Seminar Series, Portland, Oregon, USA (2011)
-
Recognizing English Queries in Mandarin Voice Search
Hung-An Chang, Yun-hsuan Sung, Brian Strope, Francoise Beaufays
ICASSP (2011)
-
Speech Retrieval
Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran, Murat Saraçlar
Spoken Language Understanding, John Wiley and Sons, Ltd (2011), pp. 417-446
-
Summary of Opus listening test results
Christian Hoene, Jean-Marc Valin, Koen Vos, Jan Skoglund
IETF, IETF (2011)
-
TechWare: Mobile Media Search Resources [Best of the Web]
Z. Liu, M. Bacchiani
IEEE Signal Processing Magazine, vol. 28 (2011), pp. 142-145
-
Unsupervised Testing Strategies for ASR
Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei
Interspeech 2011, pp. 1685-1688
-
Challenges in Automatic Speech Recognition
Ciprian Chelba, Johan Schalkwyk, Michiel Bacchiani
Interspeech 2010
-
Decision Tree State Clustering with Word and Syllable Features
Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan
Interspeech, ISCA (2010), 2958 – 2961
-
Discriminative Topic Segmentation of Text and Speech
Mehryar Mohri, Pedro Moreno, Eugene Weinstein
International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)
-
Google Search by Voice: A Case Study
Johan Schalkwyk, Doug Beeferman, Francoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Garrett, Brian Strope
Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, Springer (2010), pp. 61-90
-
On-Demand Language Model Interpolation for Mobile Speech Input
Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk
Interspeech (2010), pp. 1812-1815
-
Search by Voice in Mandarin Chinese
Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno
Interspeech 2010, pp. 354-357
-
Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models
Francoise Beaufays, Vincent Vanhoucke, Brian Strope
Proc Interspeech (2010)
-
A new quality measure for topic segmentation of text and speech
Mehryar Mohri, Pedro J. Moreno, Eugene Weinstein
Conference of the International Speech Communication Association (Interspeech) (2009)
-
Restoring Punctuation and Capitalization in Transcribed Speech
Agustín Gravano, Martin Jansche, Michiel Bacchiani
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4741-4744
-
Revisiting Graphemes with Increasing Amounts of Data
Yun-Hsuan Sung, Thad Hughes, Francoise Beaufays, Brian Strope
ICASSP, IEEE (2009)
-
Arnab Ghoshal, Martin Jansche, Sanjeev Khudanpur, Michael Riley, Morgan Ulinski
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4289-4292
-
Confidence Scores for Acoustic Model Adaptation
C. Gollan, M. Bacchiani
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2008)
-
Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing
Michiel Bacchiani, Francoise Beaufays, Johan Schalkwyk, Mike Schuster, Brian Strope
Proc. ICASSP (2008)
-
Retrieval and Browsing of Spoken Content
Ciprian Chelba, Timothy J. Hazen, Murat Saraçlar
Signal Processing Magazine, IEEE, vol. 25 (2008), pp. 39-49
-
Speech Recognition with Weighted Finite-State Transducers
Mehryar Mohri, Fernando C. N. Pereira, Michael Riley
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
-
Speech Recognition with Weighted Finite-State Transducers
Mehryar Mohri, Fernando C. N. Pereira, Michael Riley
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
