Speech Processing
Our goal in Speech Technology Research is twofold: to make speaking to devices around you (home, in car), devices you wear (watch), devices with you (phone, tablet) ubiquitous and seamless.
Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.
We are also in a unique position to deliver very user-centric research. Researchers have the wealth of millions of users talking to Voice Search or the Android Voice Input every day. and can conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we keep our users very close to make sure we solve real problems and have real impact.
We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 55 languages and we keep expanding our reach to more and more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.
In terms of a challenge, indexing and transcribing the web’s audio content is another challenge we have set for ourself, and is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and of course... cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The payoff is immense: imagine making every lecture on the web accessible to every language; this is the kind of impact we are striving for.
264 Publications
(Almost) Zero-Shot Cross-Lingual Spoken Language Understanding
Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck
Proceedings of the IEEE ICASSP (2018)
An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model
Anjuli Kannan, Yonnghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng Chen, Rohit Prabhavalkar
ICASSP (2018)
Decoding the auditory brain with canonical component analysis
Alain de Cheveigné, Daniel D. E. Wong, Giovanni M. Di Liberto, Jens Hjortkjaer, Malcolm Slaney, Edmund Lalor
NeuroImage (2018)
Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models
Rohit Prabhavalkar, Tara Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan
ICASSP 2018 (to appear)
Multilingual Speech Recognition with a Single End-to-End Model
Shubham Toshniwal, Tara N. Sainath, Ron Weiss, Bo Li, Pedro Moreno, Eugene Weinsten, Kanishka Rao
ICASSP (2018)
Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio
ICASSP (2018)
Sound source separation using phase difference and reliable mask selection
Chanwoo Kim, Anjali Menon, Michiel Bacchiani, Richard M. Stern
ICASSP (2018) (to appear)
Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani
ICASSP 2018 (2018)
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Chung-Cheng Chiu, Tara Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Katya Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani
ICASSP (2018) (to appear)
A Cascade Architecture for Keyword Spotting on Mobile Devices
Alexander Gruenstein, Raziel Alvarez, Chris Thornton, Mohammadali Ghodrat
31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017)
A Comparison of Sequence-to-Sequence Models for Speech Recognition
Rohit Prabhavalkar, Kanishka Rao, Tara Sainath, Bo Li, Leif Johnson, Navdeep Jaitly
Interspeech 2017, ISCA (2017)
A Segmental Framework for Fully-Unsupervised Large-Vocabulary Speech Recognition
Herman Kamper, Aren Jansen, Sharon Goldwater
Computer Speech and Language (2017) (to appear)
A more general method for pronunciation learning
Antoine Bruguier, Dan Gnanapragasam, Francoise Beaufays, Kanishka Rao, Leif Johnson
Interspeech 2017 (2017)
Acoustic Modeling for Google Home
Bo Li, Tara Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean Chin, Khe Chai Sim, Ron J. Weiss, Kevin Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Rick Rose, Matt Shannon
INTERSPEECH 2017 (2017)
An Analysis of "Attention" in Sequence-to-Sequence Models
Rohit Prabhavalkar, Tara Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly
Interspeech 2017, ISCA (2017)
Approaches for Neural-Network Language Model Adaptation
Fadi Biadsy, Michael Alexander Nirschl, Min Ma, Shankar Kumar
Interspeech 2017, Stockholm, Sweden (2017)
Areal and Phylogenetic Features for Multilingual Speech Synthesis
Alexander Gutkin, Richard Sproat
Proc. of Interspeech 2017, ISCA, August 20–24, 2017, Stockholm, Sweden, pp. 2078-2082
Attention-Based Models for Text-Dependent Speaker Verification
F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan
Binaural processing for robust speech recognition of degraded speech
Anjali Menon, Chanwoo Kim, Umpei Kurokawa, Richard M. Stern
IEEE Automatic Speech Recognition and Understanding Workshop (2017)
Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals
Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro
Interpspeech 2017 (2017)
Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models
Chanwoo Kim, Ehsan Variani, Arun Narayanan, Michiel Bacchiani
arxiv (2017)
Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani
Interspeech 2017 (2017)
Endpoint detection using grid long short-term memory networks for streaming speech recognition
Bo Li, Carolina Parada, Gabor Simko, Shuo-yiin Chang, Tara Sainath
In Proc. Interspeech 2017 (to appear)
Generalized End-to-End Loss for Speaker Verification
Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno
Chanwoo Kim, Ananya Misra, Kean Chin, Thad Hughes, Arun Narayanan, Tara Sainath, Michiel Bacchiani
interspeech 2017 (2017), pp. 379-383
Generative Model-Based Text-to-Speech Synthesis
MIT (2017)
Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silen, Jakub Vit
Interspeech (2017)
Highway-LSTM and Recurrent Highway Networks for Speech Recognition
Proc. Interspeech 2017, ISCA
Human and Machine Hearing: Extracting Meaning from Sound
Cambridge University Press (2017)
Improved end-of-query detection for streaming speech recognition
Carolina Parada, Gabor Simko, Matt Shannon, Shuo-yiin Chang
Proc. Interspeech 2017 (2017) (to appear)
Incoherent idempotent ambisonics rendering
W. Bastiaan Kleijn, Andrew Allen, Jan Skoglund, Felicia Lim
2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)
Joint Wideband Source Localization and Acquisition Based on a Grid-Shift Approach
Christos Tzagkarakis, Bastiaan Kleijn, Jan Skoglund
2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)
Keyword Spotting for Google Assistant Using Contextual Speech Recognition
Assaf Michaely, Carolina Parada, Frank Zhang, Gabor Simko, Petar Aleksic
Language Modeling in the Era of Abundant Data
AI With the Best online conference. (2017)
Latent Sequence Decompositions
William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly
ICLR (2017)
Multi-Accent Speech Recognition with Hierarchical Grapheme Based Models
ICASSP 2017 (to appear)
Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition
Tara Sainath, Ron J. Weiss, Kevin Wilson, Bo Li, Arun Narayanan, Ehsan Variani, Michiel Bacchiani, Izhak Shafran, Andrew Senior, Kean Chin, Ananya Misra, Chanwoo Kim
IEEE /ACM Transactions on Audio, Speech, and Language Processing, vol. 25 (2017), pp. 965 - 979
On Lattice Generation for Large Vocabulary Speech Recognition
David Rybach, Johan Schalkwyk, Michael Riley
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan (2017)
Optimizing expected word error rate via sampling for speech recognition
Matt Shannon
Proc. Interspeech 2017 (2017) (to appear)
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis Carlos Cobo Rus, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alexander Graves, Helen King, Thomas Walters, Dan Belov, Demis Hassabis
NA, Google Deepmind, NA (2017)
Practically Efficient Nonlinear Acoustic Echo Cancellers Using Cascaded Block RLS and FLMS Adaptive Filters
Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs
ICASSP (2017)
Raw Multichannel Processing Using Deep Neural Networks
Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani, Bo Li, Ehsan Variani, Izhak Shafran, Andrew Senior, Kean Chin, Ananya Misra, Chanwoo Kim
New Era for Robust Speech Recognition: Exploiting Deep Learning, Springer (2017)
Robust Speech Recognition Based on Binaural Auditory Processing
Anjali Menon, Chanwoo Kim, Richard M. Stern
INTERSPEECH 2017 (2017), pp. 3872-3876
Robust and low-complexity blind source separation for meeting rooms
W. Bastiaan Kleijn, Felicia Lim
Proceedings Fifth Joint Workshop on Hands-free Speech Communication and Microphone Arrays (2017)
Sparse Non-negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap
Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy
The 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, pp. 2725-2729 (to appear)
Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno
Streaming Small-Footprint Keyword Spotting Using Sequence-to-Sequence Models
Yanzhang (Ryan) He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw
Automatic Speech Recognition and Understanding (ASRU), 2017 IEEE Workshop on
Syllable-Based Acoustic Modeling with CTC-SMBR-LSTM
Zhongdi Qu, Parisa Haghani, Eugene Weinstein, Pedro Moreno
ASRU 2017
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous
Interspeech (2017)
Trainable Frontend For Robust and Far-Field Keyword Spotting
Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous
Proc. IEEE ICASSP 2017, New Orleans, LA
Uncovering Latent Style Factors for Expressive Speech Synthesis
Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Battenberg, Rob Clark, Rif A. Saurous
NIPS Workshop on Machine Learning for Audio Signal Processing (ML4Audio) (2017) (to appear)
Proc. of Interspeech 2017, ISCA, August 20–24, Stockholm, Sweden, pp. 2183-2187
Very Deep Convolutional Networks for End-to-End Speech Recognition
Yu Zhang, William Chan, Navdeep Jaitly
ICASSP (2017)
Wavenet based low rate speech coding
W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters
arXiv preprint arXiv:1712.01120 (2017)
Byung Joon Cho, Haeyong Kwon, Ji-Won Cho, Chanwoo Kim, Richard M. Stern, Hyung-Min Park
IEEE SIGNAL PROCESSING LETTERS, vol. 23 (2016), pp. 780-784
Herbert Buchner, Simon Godsill, Jan Skoglund
ICASSP (2016)
AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech
Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley
NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop (to appear)
Mortaza Doulaty, Richard Rose, Olivier Siohan
Proceedings of the IEEE 2016 Workshop on Spoken Language Technology (SLT2016)
Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs
International Workshop on Acoustic Signal Enhancement 2016 (IWAENC2016)
Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla
Alexander Gutkin, Linne Ha, Martin Jansche, Oddur Kjartansson, Knot Pipatsrisawat, Richard Sproat
SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, 09-12 May 2016, Yogyakarta, Indonesia; Procedia Computer Science, Elsevier B.V., pp. 194-200
Ehsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani
Interspeech 2016 (2016)
Contextual prediction models for speech recognition
Yoni Halpern, Keith Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Baeuml
Proceedings of Interspeech 2016
Cross-lingual projection for class-based language models
Beat Gfeller, Vlad Schogol, Keith Hall
Directly Modeling Voiced and Unvoiced Components in Speech Waveforms by Neural Networks
Keiichi Tokuda, Heiga Zen
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2016), pp. 5640-5644
Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition
Austin Waters, Yevgen Chebotar
Interspeech (2016)
Distributed representation and estimation of WFST-based n-gram models
Cyril Allauzen, Michael Riley, Brian Roark
Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (StatFSM) (2016), pp. 32-41
End-to-End Text-Dependent Speaker Verification
Georg Heigold, Ignacio Moreno, Samy Bengio, Noam M. Shazeer
International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Factored Spatial and Spectral Multichannel Raw Waveform CLDNNs
Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani
International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak
Proc. Interspeech, San Francisco, CA, USA (2016)
Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection
Ruben Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada
Flatstart-CTC: a new acoustic model training procedure for speech recognition
Andrew Senior, Hasim Sak, Kanishka Rao
Yiteng (Arden) Huang, Alejandro Luebs, Jan Skoglund, W. Bastiaan Kleijn
ICASSP (2016)
High quality agreement-based semi-supervised training data for acoustic modeling
Félix de Chaumont Quitry, Asa Oines, Pedro Moreno, Eugene Weinstein
2016 IEEE Workshop on Spoken Language Technology
Learning Compact Recurrent Neural Networks
Zhiyun Lu, Vikas Sindhwani, Tara Sainath
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
Learning N-gram Language Models from Uncertain Data
Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark
Interspeech (2016)
Learning Personalized Pronunciations for Contact Names Recognition
Tony Bruguier, Fuchun Peng, Francoise Beaufays
Interspeech 2016 (to appear)
Listen, Attend and Spell: A Neural Network for Large Vocabulary Conversational Speech Recognition
William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals
ICASSP (2016)
Lower Frame Rate Neural Network Acoustic Models
Interspeech (2016)
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
Proc. Interspeech, ISCA (2016) (to appear)
Proc. Interspeech, ISCA (2016) (to appear)
Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani
Proc. Interspeech, ISCA (2016)
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
Hagen Soltau, Hank Liao, Hasim Sak
ArXiv e-prints (2016)
Hong-Goo Kang, Michael Graczyk, Jan Skoglund
International Workshop on Acoustic Signal Enhancement 2016 (IWAENC 2016)
Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, Ian McGraw
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
On the Efficient Representation and Execution of Deep Acoustic Models
Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin
Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) (2016)
Personalized Speech Recognition On Mobile Devices
Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Françoise Beaufays, Carolina Parada
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
Chanwoo Kim, Richard M. Stern
Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks
Daan van Esch, Kanishka Rao, Mason Chua
Proceedings of InterSpeech 2016 (to appear)
Pynini: A Python library for weighted finite-state grammar compilation
Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (2016), pp. 75-80
Recent Advances in Google Real-time HMM-driven Unit Selection Synthesizer
Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, Hanna Silen
INTERSPEECH 2016, Sep 8-12, San Francisco, USA, pp. 2238-2242
Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction
Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran
Proc. Interspeech, ISCA (2016)
Robust Estimation of Reverberation Time Using Polynomial Roots
Ian Kelly, Francis Boland, Jan Skoglund
AES 60th Conference on Dereverberation and Reverberation of Audio, Music, and Speech, Google Ireland Ltd. (2016)
Selection and Combination of Hypotheses for Dialectal Speech Recognition
Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro J. Moreno
Semantic Model for Fast Tagging of Word Lattices
IEEE Spoken Language Technology (SLT) Workshop (2016) (to appear)
TTS for Low Resource Languages: A Bangla Synthesizer
Alexander Gutkin, Linne Ha, Martin Jansche, Knot Pipatsrisawat, Richard Sproat
10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, European Language Resources Association (ELRA), Portorož, Slovenia, pp. 2005-2010
Towards Acoustic Model Unification Across Dialects
Austin Waters, Meysam Bastani, Mohamed G. Elfeky, Pedro Moreno, Xavier Velez
2016 IEEE Workshop on Spoken Language Technology
Unsupervised Context Learning For Speech Recognition
Assaf Michaely, Justin Scheiner, Mohammadreza Ghodsi, Petar Aleksic, Zelin Wu
Spoken Language Technology (SLT) Workshop, IEEE (2016)
Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
Aren Jansen, Herman Kamper, Sharon Goldwater
IEEE Transactions on Audio, Speech, and Language Processing (2016)
Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen
Proc. ISCA SSW9 (2016), pp. 238-245
Yannis Agiomyrgiannakis, Zoe Roupakia
Guang Wang, Richard F. Lyon, Emmanuel M. Drakakis
IEEE Transactions on Biomedical Circuits and Systems, vol. 9 (2015), pp. 72-86
Ehsan Variani, Erik McDermott, Georg Heigold
Acoustic Modeling for Speech Synthesis: from HMM to RNN
IEEE ASRU, Scottsdale, Arizona, U.S.A. (2015)
Acoustic Modeling in Statistical Parametric Speech Synthesis - From HMM to LSTM-RNN
Proc. MLSLP (2015)
Acoustic Modelling with CD-CTC-SMBR LSTM RNNS
Andrew Senior, Hasim Sak, Felix de Chaumont Quitry, Tara N. Sainath, Kanishka Rao
ASRU (2015)
Rohit Prabhavalkar, Raziel Alvarez, Carolina Parada, Preetum Nakkiran, Tara Sainath
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2015), pp. 4704-4708
Automatic Pronunciation Verification for Speech Recognition
Kanishka Rao, Fuchun Peng, Françoise Beaufays
ICASSP (2015)
Bringing Contextual Information to Google Speech Recognition
Petar Aleksic, Mohammadreza Ghodsi, Assaf Michaely, Cyril Allauzen, Keith Hall, Brian Roark, David Rybach, Pedro Moreno
Interspeech 2015, International Speech Communications Association
Composition-based on-the-fly rescoring for salient n-gram biasing
Keith Hall, Eunjoon Cho, Cyril Allauzen, Francoise Beaufays, Noah Coccaro, Kaisuke Nakajima, Michael Riley, Brian Roark, David Rybach, Linda Zhang
Interspeech 2015, International Speech Communications Association
Compressing Deep Neural Networks using a Rank-Constrained Topology
Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada
Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), ISCA (2015), pp. 1473-1477
Context dependent phone models for LSTM RNN acoustic modelling
Andrew W. Senior, Hasim Sak, Izhak Shafran
ICASSP (2015), pp. 4585-4589
Convolutional Neural Networks for Small-Footprint Keyword Spotting
Interspeech (2015)
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
Tara Sainath, Oriol Vinyals, Andrew Senior, Hasim Sak
ICASSP (2015)
Simon Godsill, Herbert Buchner, Jan Skoglund
James Eaton, Alastair Moore, Patrick Naylor, Jan Skoglund
Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, Li Deng
IEEE Signal Processing Magazine, vol. 32 (2015), pp. 35-52
Directly Modeling Speech Waveforms by Neural Networks for Statistical Parametric Speech Synthesis
Keiichi Tokuda, Heiga Zen
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4215-4219
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Hasim Sak, Andrew W. Senior, Kanishka Rao, Françoise Beaufays
CoRR, vol. abs/1507.06947 (2015)
Fix It Where It Fails: Pronunciation Learning by Mining Error Corrections from Speech Logs
Zhenzhen Kou, Daisy Stanton, Fuchun Peng, Françoise Beaufays, Trevor Strohman
ICASSP (2015)
Garbage Modeling for On-device Speech Recognition
Christophe Van Gysel, Leonid Velikovich, Ian McGraw, Françoise Beaufays
Interspeech 2015, International Speech Communications Association (to appear)
Geo-location for Voice Search Language Modeling
Ciprian Chelba, Xuedong Zhang, Keith Hall
Interspeech 2015, International Speech Communications Association, pp. 1438-1442
Grapheme-to-Phoneme Conversion Using Long Short-Term Memory Recurrent Neural Networks
Kanishka Rao, Fuchun Peng, Hasim Sak, Françoise Beaufays
ICASSP (2015)
Improved recognition of contact names in voice commands
Petar Aleksic, Cyril Allauzen, David Elson, Aleks Kracun, Diego Melendo Casado, Pedro J. Moreno
Language Modeling in the Era of Abundant Data
Stanford Information Theory Forum (2015)
Large Vocabulary Automatic Speech Recognition for Children
Hank Liao, Golan Pundak, Olivier Siohan, Melissa Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew Senior, Françoise Beaufays, Michiel Bacchiani
Interspeech (2015)
Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR
Arun Narayanan, Ananya Misra, Kean Chin
INTERSPEECH-2015, ISCA, pp. 3571-3575
Learning acoustic frame labeling for speech recognition with recurrent neural networks
Hasim Sak, Andrew W. Senior, Kanishka Rao, Ozan Irsoy, Alex Graves, Françoise Beaufays, Johan Schalkwyk
ICASSP (2015), pp. 4280-4284
Learning the Speech Front-end with Raw Waveform CLDNNs
Tara Sainath, Ron J. Weiss, Kevin Wilson, Andrew W. Senior, Oriol Vinyals
Interspeech (2015)
William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals
CoRR, vol. abs/1508.01211 (2015)
Locally-Connected and Convolutional Neural Networks for Small Footprint Speaker Recognition
Yu-hsin Chen, Ignacio Lopez Moreno, Tara Sainath, Mirkó Visontai, Raziel Alvarez, Carolina Parada
Interspeech (2015)
Long Short-Term Memory Language Models with Additive Morphological Features for Automatic Speech Recognition
Daniel Renshaw, Keith B. Hall
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)
Multi-Dialectical Languages Effect on Speech Recognition
Mohamed Elfeky, Pedro J. Moreno, Victor Soto
International Conference on Natural Language and Speech Processing (2015)
Multitask learning and system combination for automatic speech recognition
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
Pruning Sparse Non-negative Matrix N-gram Language Models
Joris Pelemans, Noam M. Shazeer, Ciprian Chelba
Proceedings of Interspeech 2015, ISCA, pp. 1433-1437
Query-by-Example Keyword Spotting Using Long Short-Term Memory Networks
Guoguo Chen, Carolina Parada, Tara N. Sainath
ICASSP (2015)
Rapid Vocabulary Addition to Context-Dependent Decoder Graphs
Interspeech 2015
Sequence-based Class Tagging for Robust Transcription in ASR
Lucy Vasserman, Vlad Schogol, Keith Hall
Interspeech 2015, International Speech Communications Association (to appear)
Chanwoo Kim, Kean Chin
INTERSPEECH 2015, pp. 751-755
Sparse Non-negative Matrix Language Modeling for Geo-annotated Query Session Data
Ciprian Chelba, Noam M. Shazeer
Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Proceedings, IEEE, to appear (to appear)
Speaker Location and Microphone Spacing Invariant Acoustic Modeling from Raw Multichannel Waveforms
Tara N. Sainath, Ron J. Weiss, Kevin Wilson, Arun Narayanan, Michiel Bacchiani, Andrew Senior
ASRU (2015)
Speech Acoustic Modeling from Raw Multichannel Waveforms
Yedid Hoshen, Ron Weiss, Kevin W Wilson
International Conference on Acoustics, Speech, and Signal Processing, IEEE (2015)
Statistical parametric speech synthesis: from HMM to LSTM-RNN
RTTH Summer School on Speech Technology -- A Deep Learning Perspective, Barcelona, Spain (2015)
Sahar Akram, Alain de Cheveigné, Peter Udo Diehl, Emily Graber, Carina Graversen, Jens Hjortkjaer, Nima Mesgarani, Lucas Parra, Ulrich Pomper, Shihab Shamma, Jonathan Simon, Malcolm Slaney, Daniel Wong
Institute for Neuroinformatics (2015)
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4470-4474
ViSQOL: an objective speech quality model
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015 (13) (2015), pp. 1-18
Vocaine the Vocoder and Applications in Speech Synthesis
ICASSP, IEEE (2015) (to appear)
A big data approach to acoustic model training corpus selection
Olga Kapralova, John Alex, Eugene Weinstein, Pedro Moreno, Olivier Siohan
Conference of the International Speech Communication Association (Interspeech) (2014)
An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech
Alastair H Moore, Patrick A Naylor, Jan Skoglund
Proceedings of European Signal Processing Conference (EUSIPCO) 2014
Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks
Georg Heigold, Erik McDermott, Vincent Vanhoucke, Andrew Senior, Michiel Bacchiani
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Firenze, Italy (2014)
Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks: Towards Big Data
Erik McDermott, Georg Heigold, Pedro Moreno, Andrew Senior, Michiel Bacchiani
Interspeeech, ISCA (2014)
Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition
M. Bacchiani, A. Senior, G. Heigold
Proceedings of the European Conference on Speech Communication and Technology (2014) (to appear)
Automatic Language Identification Using Deep Neural Networks
Ignacio Lopez-Moreno, Javier Gonzalez-Dominguez, Oldrich Plchot
Proc. ICASSP, IEEE (2014)
Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks
Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Hasim Sak
Interspeech (2014)
Autoregressive Product of Multi-frame Predictions Can Improve the Accuracy of Hybrid Models
Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton
Proceedings of Interspeech 2014
Backoff Inspired Features for Maximum Entropy Language Models
Fadi Biadsy, Keith Hall, Pedro Moreno, Brian Roark
Proceedings of Interspeech, ISCA (2014)
Computer-aided quality assurance of an Icelandic pronunciation dictionary
LREC 2014, Reykjavik
Context Dependent State Tying for Speech Recognition using Deep Neural Network Acoustic Models
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)
Deep Mixture Density Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2014), pp. 3872-3876
Deep Neural Networks for Small Footprint Text-dependent Speaker Verification
Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, Javier Gonzalez-Dominguez
Proc. ICASSP, IEEE (2014)
Direct construction of compact context-dependency transducers from data
David Rybach, Michael Riley, Chris Alberti
Computer Speech & Language, vol. 28 (2014), pp. 177-191
Discriminative pronunciation modeling for dialectal speech recognition
Maider Lehr, Kyle Gorman, Izhak Shafran
Proc. Interspeech (2014) (to appear)
Encoding Linear Models As Weighted Finite-State Transducers
Ke Wu, Cyril Allauzen, Keith Hall, Michael Riley, Brian Roark
Interspeech 2014, ISCA, pp. 1258-1262
Fine Context, Low-rank, Softplus Deep Neural Networks for Mobile Speech Recognition
Andrew Senior, Xin Lei
Proc. ICASSP (2014) (to appear)
Frame by Frame Language Identification in Short Utterances using Deep Neural Networks
Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez
Neural Networks Special Issue: Neural Network Learning in Big Data (2014)
GMM-Free DNN Training
A. Senior, G. Heigold, M. Bacchiani, H. Liao
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)
Improving DNN Speaker Independence with I-vector Inputs
Andrew Senior, Ignacio Lopez-Moreno
Proc. ICASSP, IEEE (2014)
JustSpeak: Enabling Universal Voice Control on Android
Yu Zhong, T. V. Raman, Casey Burkhardt, Fadi Biadsy, Jeffrey P. Bigham
W4A 2014
Large-Scale Speaker Identification
Ludwig Schmidt, Matthew Sharifi, Ignacio Lopez-Moreno
Proc. ICASSP, IEEE (2014)
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
Hasim Sak, Andrew W. Senior, Françoise Beaufays
CoRR, vol. abs/1402.1128 (2014)
Long short-term memory recurrent neural network architectures for large scale acoustic modeling
Hasim Sak, Andrew W. Senior, Françoise Beaufays
INTERSPEECH (2014), pp. 338-342
Pronunciation Learning for Named-Entities through Crowd-Sourcing
Attapol Rutherford, Fuchun Peng, Françoise Beaufays
Proceedings of Interspeech (2014)
Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern
INTERSPEECH (2014), pp. 2715-2718
Robust speech recognition using temporal masking and thresholding algorithm
Chanwoo Kim, Kean Chin, Michiel Bacchiani, R. M. Stern
INTERSPEECH-2014, pp. 2734-2738
Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks
Hasim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao
Interspeech (2014)
Sinusoidal Interpolation Across Missing Data
W. Bastiaan Kleijn, Turaj Zakizadeh Shabestary, Jan Skoglund
International Workshop on Acoustic Signal Enhancement 2014 (IWAENC 2014), pp. 71-75
Small-Footprint Keyword Spotting using Deep Neural Networks
Guoguo Chen, Carolina Parada, Georg Heigold
Statistical Parametric Speech Synthesis
UKSpeech Conference, Edinburgh, UK (2014)
Text-To-Speech with cross-lingual Neural Network-based grapheme-to-phoneme models
Xavi Gonzalvo, Monika Podsiadlo
Proceedings of Interspeech, ISCA (2014)
Training Data Selection Based On Context-Dependent State Matching
Proceedings of ICASSP 2014
Word Embeddings for Speech Recognition
Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech (2014)
Yannis Agiomyrgiannakis, Florian Eyben
Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices
Xin Lei, Andrew Senior, Alexander Gruenstein, Jeffrey Sorensen
Interspeech (2013)
An Empirical study of learning rates in deep neural networks for speech recognition
Andrew Senior, Georg Heigold, Marc'aurelio Ranzato, Ke Yang
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013) (to appear)
Deep Learning in Speech Synthesis
8th ISCA Speech Synthesis Workshop, Barcelona, Spain (2013)
Deep Neural Networks with Auxiliary Gaussian Mixture Models for Real-Time Speech Recognition
Xin Lei, Hui Lin, Georg Heigold
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
Ciprian Chelba, Johan Schalkwyk
Mobile Speech and Advanced Natural Language Solutions, Springer Science+Business Media, New York (2013), pp. 197-229
Language Model Verbalization for Automatic Speech Recognition
Hasim Sak, Françoise Beaufays, Kaisuke Nakajima, Cyril Allauzen
Proc ICASSP, IEEE (2013)
Language Modeling Capitalization
Françoise Beaufays, Brian Strope
Proc ICASSP, IEEE (2013) (to appear)
Large Scale Distributed Acoustic Modeling With Back-off N-grams
Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson
IEEE Transactions on Audio, Speech and Language Processing, vol. 21 (2013), pp. 1158-1169
Large Scale Distributed Acoustic Modeling With Back-off N-grams
Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson
ICSI, Berkeley, California (2013)
Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription
Hank Liao, Erik McDermott, Andrew Senior
ASRU (2013)
Mixture of mixture n-gram language models
Hasim Sak, Cyril Allauzen, Kaisuke Nakajima, Françoise Beaufays
ASRU (2013), pp. 31-36
Monitoring the Effects of Temporal Clipping on VoIP Speech Quality
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
Interspeech 2013, pp. 1188-1192
Multiframe Deep Neural Networks for Acoustic Modeling
Vincent Vanhoucke, Matthieu Devin, Georg Heigold
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
Multilingual acoustic models using distributed deep neural networks
Georg Heigold, Vincent Vanhoucke, Andrew Senior, Patrick Nguyen, Marc'aurelio Ranzato, Matthieu Devin, Jeff Dean
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
On Rectified Linear Units For Speech Processing
M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton
38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013)
Pre-Initialized Composition for Large-Vocabulary Speech Recognition
Interspeech 2013, 666 – 670
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2013)
Rate-Distortion Optimization for Multichannel Audio Compression
Minyue Li, Jan Skoglund, W. Bastiaan Kleijn
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Recurrent Neural Networks for Voice Activity Detection
Thad Hughes, Keir Mierle
ICASSP, IEEE (2013), pp. 7378-7382
Robustness of Speech Quality Metrics to Background Noise and Network Degradations: Comparing VISQOL, PESQ and POLQA
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 3697-3701
Search Results Based N-Best Hypothesis Rescoring With Maximum Entropy Classification
Fuchun Peng, Scott Roy, Ben Shahshahani, Françoise Beaufays
Proceedings of ASRU (2013)
Smoothed marginal distribution constraints for language modeling
Brian Roark, Cyril Allauzen, Michael Riley
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013), pp. 43-52
Speaker Adaptation of Context Dependent Deep Neural Networks
International Conference of Acoustics, Speech, and Signal Processing. (2013)
Speech and Natural Language: Where Are We Now And Where Are We Headed?
Mobile Voice Conference, San Francisco (2013)
Statistical Parametric Speech Synthesis Using Deep Neural Networks
Heiga Zen, Andrew Senior, Mike Schuster
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 7962-7966
Written-Domain Language Modeling for Automatic Speech Recognition
Hasim Sak, Yun-hsuan Sung, Françoise Beaufays, Cyril Allauzen
Interspeech (2013)
iVector-based Acoustic Data Selection
Olivier Siohan, Michiel Bacchiani
Proceedings of Interspeech (2013)
Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition
Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke
Proceedings of Interspeech 2012
Buildling adaptive dialogue systems via Bayes-adaptive POMDP
Shaowei Png, Joelle Pineau, B. Chaib-draa
IEEE Journal of Selected Topics in Signal Processing, vol. vol.6(8). 2012. (2012), pp. 917-927
Chapter 17: Uncertainty Decoding, In Virtanen, Singh, & Raj (Eds.) Techniques for Noise Robustness in Automatic Speech Recognition.
Wiley (2012), pp. 463-485
Continuous Space Discriminative Language Modeling
Puyang Xu, Sanjeev Khudanpur, Maider Lehr, Emily Prud’hommeaux, Nathan Glenn, Damianos Karakos, Brian Roark, Kenji Sagae, Murat Saraclar, Izhak Shafran, Dan Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley
Deep Neural Networks for Acoustic Modeling in Speech Recognition
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury
Signal Processing Magazine (2012)
Distributed Acoustic Modeling with Back-off N-grams
Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson
Proceedings of ICASSP 2012, IEEE, pp. 4129-4132
Distributed Discriminative Language Models for Google Voice Search
Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope
Proceedings of ICASSP 2012, IEEE, pp. 5017-5021
Estimating Word-Stability During Incremental Speech Recognition
Ian McGraw, Alexander Gruenstein
Interspeech (2012)
Exemplar-Based Processing for Speech Recognition: An Overview
Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Dirk Van Compernolle, Kris Demuynck, Jort F. Gemmeke, Jerome R. Bellegarda, Shiva Sundaram
IEEE Signal Process. Mag., vol. 29 (2012), pp. 98-113
Google's Cross-Dialect Arabic Voice Search
Fadi Biadsy, Pedro J. Moreno, Martin Jansche
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 4441-4444
Hallucinated N-Best Lists for Discriminative Language Modeling
Kenji Sagae, Maider Lehr, Emily Tucker Prud’hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Murat Saraçlar, Izhak Shafran, Daniel M. Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hassler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)
Haptic Voice Recognition Grand Challenge
K. Sim, S. Zhao, K. Yu, H. Liao
14th ACM International Conference on Multimodal Interaction. (2012)
Bastiaan Kleijn, Jan Skoglund
International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
Georg Heigold, Patrick Nguyen, Mitchel Weintraub, Vincent Vanhoucke
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Kyoto, Japan (2012), pp. 4437-4440
Japanese and Korean Voice Search
Mike Schuster, Kaisuke Nakajima
International Conference on Acoustics, Speech and Signal Processing, IEEE (2012), pp. 5149-5152
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Leif Johnson, Michael Riley, Peng Xu, Preethi Jyothi, Thorsten Brants, Vida Ha, Will Neveitt
University of Toronto (2012)
Large Scale Language Modeling in Automatic Speech Recognition
Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar
Google (2012)
Large-scale Discriminative Language Model Reranking for Voice Search
Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Association for Computational Linguistics, pp. 41-49
Learning improved linear transforms for speech recognition
Andrew Senior, Youngmin Cho, Jason Weston
Music Models for Music-Speech Separation
Thad Hughes, Trausti Kristjansson
ICASSP, IEEE (2012), pp. 4917-4920
Optimal Size, Freshness and Time-frame for Voice Search Vocabulary
Google (2012)
Recognition of Multilingual Speech in Mobile Applications
Hui Lin, Jui-Ting Huang, Francoise Beaufays, Brian Strope, Yun-hsuan Sung
ICASSP (2012)
Recurrent Neural Networks for Noise Reduction in Robust ASR
Andrew Maas, Quoc V. Le, Tyler M. O’Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng
Semi-supervised Discriminative Language Modeling for Turkish ASR
Murat Saraçlar, Daniel M. Bikel, Keith Hall, Kenji Sagae
2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, IEEE, Kyoto, Japan
Spectral Intersections for Non-Stationary Signal Separation
Trausti Kristjansson, Thad Hughes
Proceedings of InterSpeech 2012, Portland, OR
Speech/Nonspeech Segmentation in Web Videos
Proceedings of InterSpeech 2012
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
Voice Query Refinement
Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk
Interspeech (2012)
A Web-Based Tool for Developing Multilingual Pronunciation Lexicons
Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa
12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 3331-3332
Bayesian Language Model Interpolation for Mobile Speech Input
Interspeech 2011, pp. 1429-1432
Deploying Google Search by Voice in Cantonese
Yun-hsuan Sung, Martin Jansche, Pedro Moreno
12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 2865-2868
Discriminative Features for Language Identification
C. Alberti, M. Bacchiani
Improving the speed of neural networks on CPUs
Vincent Vanhoucke, Andrew Senior, Mark Z. Mao
Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Michael Riley, Peng Xu, Thorsten Brants, Vida Ha, Will Neveitt
OGI/OHSU Seminar Series, Portland, Oregon, USA (2011)
Recognizing English Queries in Mandarin Voice Search
Hung-An Chang, Yun-hsuan Sung, Brian Strope, Francoise Beaufays
ICASSP (2011)
Speech Retrieval
Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran, Murat Saraçlar
Spoken Language Understanding, John Wiley and Sons, Ltd (2011), pp. 417-446
Summary of Opus listening test results
Christian Hoene, Jean-Marc Valin, Koen Vos, Jan Skoglund
IETF, IETF (2011)
TechWare: Mobile Media Search Resources [Best of the Web]
Z. Liu, M. Bacchiani
IEEE Signal Processing Magazine, vol. 28 (2011), pp. 142-145
Unsupervised Testing Strategies for ASR
Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei
Interspeech 2011, pp. 1685-1688
Challenges in Automatic Speech Recognition
Ciprian Chelba, Johan Schalkwyk, Michiel Bacchiani
Interspeech 2010
Decision Tree State Clustering with Word and Syllable Features
Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan
Interspeech, ISCA (2010), 2958 – 2961
Discriminative Topic Segmentation of Text and Speech
Mehryar Mohri, Pedro Moreno, Eugene Weinstein
International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)
Google Search by Voice: A Case Study
Johan Schalkwyk, Doug Beeferman, Francoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Garrett, Brian Strope
Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, Springer (2010), pp. 61-90
On-Demand Language Model Interpolation for Mobile Speech Input
Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk
Interspeech (2010), pp. 1812-1815
Search by Voice in Mandarin Chinese
Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno
Interspeech 2010, pp. 354-357
Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models
Francoise Beaufays, Vincent Vanhoucke, Brian Strope
Proc Interspeech (2010)
A new quality measure for topic segmentation of text and speech
Mehryar Mohri, Pedro J. Moreno, Eugene Weinstein
Conference of the International Speech Communication Association (Interspeech) (2009)
Restoring Punctuation and Capitalization in Transcribed Speech
Agustín Gravano, Martin Jansche, Michiel Bacchiani
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4741-4744
Revisiting Graphemes with Increasing Amounts of Data
Yun-Hsuan Sung, Thad Hughes, Francoise Beaufays, Brian Strope
Arnab Ghoshal, Martin Jansche, Sanjeev Khudanpur, Michael Riley, Morgan Ulinski
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4289-4292
Confidence Scores for Acoustic Model Adaptation
C. Gollan, M. Bacchiani
Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2008)
Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing
Michiel Bacchiani, Francoise Beaufays, Johan Schalkwyk, Mike Schuster, Brian Strope
Proc. ICASSP (2008)
Retrieval and Browsing of Spoken Content
Ciprian Chelba, Timothy J. Hazen, Murat Saraçlar
Signal Processing Magazine, IEEE, vol. 25 (2008), pp. 39-49
Speech Recognition with Weighted Finite-State Transducers
Mehryar Mohri, Fernando C. N. Pereira, Michael Riley
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
Speech Recognition with Weighted Finite-State Transducers
Mehryar Mohri, Fernando C. N. Pereira, Michael Riley
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)