Speech Processing

Our goal in Speech Technology Research is twofold: to make speaking to devices around you (home, in car), devices you wear (watch), devices with you (phone, tablet) ubiquitous and seamless.

Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.

We are also in a unique position to deliver very user-centric research. Researchers have the wealth of millions of users talking to Voice Search or the Android Voice Input every day. and can conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we keep our users very close to make sure we solve real problems and have real impact.

We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 55 languages and we keep expanding our reach to more and more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.

In terms of a challenge, indexing and transcribing the web’s audio content is another challenge we have set for ourself, and is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and of course... cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The payoff is immense: imagine making every lecture on the web accessible to every language; this is the kind of impact we are striving for.

264 Publications

(Almost) Zero-Shot Cross-Lingual Spoken Language Understanding

Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck

Proceedings of the IEEE ICASSP (2018)
An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model

Anjuli Kannan, Yonnghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng Chen, Rohit Prabhavalkar

ICASSP (2018)
Decoding the auditory brain with canonical component analysis

Alain de Cheveigné, Daniel D. E. Wong, Giovanni M. Di Liberto, Jens Hjortkjaer, Malcolm Slaney, Edmund Lalor

NeuroImage (2018)
Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

Rohit Prabhavalkar, Tara Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan

ICASSP 2018 (to appear)
Multilingual Speech Recognition with a Single End-to-End Model

Shubham Toshniwal, Tara N. Sainath, Ron Weiss, Bo Li, Pedro Moreno, Eugene Weinsten, Kanishka Rao

ICASSP (2018)
ON USING BACKPROPAGATION FOR SPEECH TEXTURE GENERATION AND VOICE CONVERSION

Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio

ICASSP (2018)
Sound source separation using phase difference and reliable mask selection

Chanwoo Kim, Anjali Menon, Michiel Bacchiani, Richard M. Stern

ICASSP (2018) (to appear)
Spectral distortion model for training phase-sensitive deep-neural networks for far-field speech recognition

Chanwoo Kim, Tara Sainath, Arun Narayanan, Ananya Misra, Rajeev Nongpiur, Michiel Bacchiani

ICASSP 2018 (2018)
State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Chung-Cheng Chiu, Tara Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Katya Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani

ICASSP (2018) (to appear)
A Cascade Architecture for Keyword Spotting on Mobile Devices

Alexander Gruenstein, Raziel Alvarez, Chris Thornton, Mohammadali Ghodrat

31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA (2017)
A Comparison of Sequence-to-Sequence Models for Speech Recognition

Rohit Prabhavalkar, Kanishka Rao, Tara Sainath, Bo Li, Leif Johnson, Navdeep Jaitly

Interspeech 2017, ISCA (2017)
A Segmental Framework for Fully-Unsupervised Large-Vocabulary Speech Recognition

Herman Kamper, Aren Jansen, Sharon Goldwater

Computer Speech and Language (2017) (to appear)
A more general method for pronunciation learning

Antoine Bruguier, Dan Gnanapragasam, Francoise Beaufays, Kanishka Rao, Leif Johnson

Interspeech 2017 (2017)
Acoustic Modeling for Google Home

Bo Li, Tara Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean Chin, Khe Chai Sim, Ron J. Weiss, Kevin Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Rick Rose, Matt Shannon

INTERSPEECH 2017 (2017)
An Analysis of "Attention" in Sequence-to-Sequence Models

Rohit Prabhavalkar, Tara Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly

Interspeech 2017, ISCA (2017)
Approaches for Neural-Network Language Model Adaptation

Fadi Biadsy, Michael Alexander Nirschl, Min Ma, Shankar Kumar

Interspeech 2017, Stockholm, Sweden (2017)
Areal and Phylogenetic Features for Multilingual Speech Synthesis

Alexander Gutkin, Richard Sproat

Proc. of Interspeech 2017, ISCA, August 20–24, 2017, Stockholm, Sweden, pp. 2078-2082
Attention-Based Models for Text-Dependent Speaker Verification

F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan

(2017)
Binaural processing for robust speech recognition of degraded speech

Anjali Menon, Chanwoo Kim, Umpei Kurokawa, Richard M. Stern

IEEE Automatic Speech Recognition and Understanding Workshop (2017)
Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals

Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro

Interpspeech 2017 (2017)
Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models

Chanwoo Kim, Ehsan Variani, Arun Narayanan, Michiel Bacchiani

arxiv (2017)
End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow

Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani

Interspeech 2017 (2017)
Endpoint detection using grid long short-term memory networks for streaming speech recognition

Bo Li, Carolina Parada, Gabor Simko, Shuo-yiin Chang, Tara Sainath

In Proc. Interspeech 2017 (to appear)
Generalized End-to-End Loss for Speaker Verification

Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno

(2017)
Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home

Chanwoo Kim, Ananya Misra, Kean Chin, Thad Hughes, Arun Narayanan, Tara Sainath, Michiel Bacchiani

interspeech 2017 (2017), pp. 379-383
Generative Model-Based Text-to-Speech Synthesis

Heiga Zen

MIT (2017)
Google's next-generation real-time unit-selection synthesizer using sequence-to-sequence LSTM-based autoencoders

Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silen, Jakub Vit

Interspeech (2017)
Highway-LSTM and Recurrent Highway Networks for Speech Recognition

Golan Pundak, Tara Sainath

Proc. Interspeech 2017, ISCA
Human and Machine Hearing: Extracting Meaning from Sound

Richard F. Lyon

Cambridge University Press (2017)
Improved end-of-query detection for streaming speech recognition

Carolina Parada, Gabor Simko, Matt Shannon, Shuo-yiin Chang

Proc. Interspeech 2017 (2017) (to appear)
Incoherent idempotent ambisonics rendering

W. Bastiaan Kleijn, Andrew Allen, Jan Skoglund, Felicia Lim

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)
Joint Wideband Source Localization and Acquisition Based on a Grid-Shift Approach

Christos Tzagkarakis, Bastiaan Kleijn, Jan Skoglund

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2017)
Keyword Spotting for Google Assistant Using Contextual Speech Recognition

Assaf Michaely, Carolina Parada, Frank Zhang, Gabor Simko, Petar Aleksic

ASRU 2017, IEEE
Language Modeling in the Era of Abundant Data

Ciprian Chelba

AI With the Best online conference. (2017)
Latent Sequence Decompositions

William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly

ICLR (2017)
Multi-Accent Speech Recognition with Hierarchical Grapheme Based Models

Hasim Sak, Kanishka Rao

ICASSP 2017 (to appear)
Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition

Tara Sainath, Ron J. Weiss, Kevin Wilson, Bo Li, Arun Narayanan, Ehsan Variani, Michiel Bacchiani, Izhak Shafran, Andrew Senior, Kean Chin, Ananya Misra, Chanwoo Kim

IEEE /ACM Transactions on Audio, Speech, and Language Processing, vol. 25 (2017), pp. 965 - 979
On Lattice Generation for Large Vocabulary Speech Recognition

David Rybach, Johan Schalkwyk, Michael Riley

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan (2017)
Optimizing expected word error rate via sampling for speech recognition

Matt Shannon

Proc. Interspeech 2017 (2017) (to appear)
Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Aäron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis Carlos Cobo Rus, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alexander Graves, Helen King, Thomas Walters, Dan Belov, Demis Hassabis

NA, Google Deepmind, NA (2017)
Practically Efficient Nonlinear Acoustic Echo Cancellers Using Cascaded Block RLS and FLMS Adaptive Filters

Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs

ICASSP (2017)
Raw Multichannel Processing Using Deep Neural Networks

Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani, Bo Li, Ehsan Variani, Izhak Shafran, Andrew Senior, Kean Chin, Ananya Misra, Chanwoo Kim

New Era for Robust Speech Recognition: Exploiting Deep Learning, Springer (2017)
Robust Speech Recognition Based on Binaural Auditory Processing

Anjali Menon, Chanwoo Kim, Richard M. Stern

INTERSPEECH 2017 (2017), pp. 3872-3876
Robust and low-complexity blind source separation for meeting rooms

W. Bastiaan Kleijn, Felicia Lim

Proceedings Fifth Joint Workshop on Hands-free Speech Communication and Microphone Arrays (2017)
Sparse Non-negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap

Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy

The 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, pp. 2725-2729 (to appear)
Speaker Diarization with LSTM

Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno

(2017)
Streaming Small-Footprint Keyword Spotting Using Sequence-to-Sequence Models

Yanzhang (Ryan) He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw

Automatic Speech Recognition and Understanding (ASRU), 2017 IEEE Workshop on
Syllable-Based Acoustic Modeling with CTC-SMBR-LSTM

Zhongdi Qu, Parisa Haghani, Eugene Weinstein, Pedro Moreno

ASRU 2017
Tacotron: Towards End-to-End Speech Synthesis

Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous

Interspeech (2017)
Trainable Frontend For Robust and Far-Field Keyword Spotting

Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous

Proc. IEEE ICASSP 2017, New Orleans, LA
Uncovering Latent Style Factors for Expressive Speech Synthesis

Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Battenberg, Rob Clark, Rif A. Saurous

NIPS Workshop on Machine Learning for Audio Signal Processing (ML4Audio) (2017) (to appear)
Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages

Alexander Gutkin

Proc. of Interspeech 2017, ISCA, August 20–24, Stockholm, Sweden, pp. 2183-2187
Very Deep Convolutional Networks for End-to-End Speech Recognition

Yu Zhang, William Chan, Navdeep Jaitly

ICASSP (2017)
Wavenet based low rate speech coding

W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters

arXiv preprint arXiv:1712.01120 (2017)
A subband-based stationary-component suppression method using harmanics and power ratio for reverberant speech recognition

Byung Joon Cho, Haeyong Kwon, Ji-Won Cho, Chanwoo Kim, Richard M. Stern, Hyung-Min Park

IEEE SIGNAL PROCESSING LETTERS, vol. 23 (2016), pp. 780-784
AN ACOUSTIC KEYSTROKE TRANSIENT CANCELER FOR SPEECH COMMUNICATION TERMINALS USING A SEMI-BLIND ADAPTIVE FILTER MODEL

Herbert Buchner, Simon Godsill, Jan Skoglund

ICASSP (2016)
AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley

NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop (to appear)
Automatic Optimization of Data Perturbation Distributions for Multi-Style Training in Speech Recognition

Mortaza Doulaty, Richard Rose, Olivier Siohan

Proceedings of the IEEE 2016 Workshop on Spoken Language Technology (SLT2016)
BI-MAGNITUDE PROCESSING FRAMEWORK FOR NONLINEAR ACOUSTIC ECHO CANCELLATION ON ANDROID DEVICES

Yiteng (Arden) Huang, Jan Skoglund, Alejandro Luebs

International Workshop on Acoustic Signal Enhancement 2016 (IWAENC2016)
Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla

Alexander Gutkin, Linne Ha, Martin Jansche, Oddur Kjartansson, Knot Pipatsrisawat, Richard Sproat

SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, 09-12 May 2016, Yogyakarta, Indonesia; Procedia Computer Science, Elsevier B.V., pp. 194-200
Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling

Ehsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani

Interspeech 2016 (2016)
Contextual prediction models for speech recognition

Yoni Halpern, Keith Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Baeuml

Proceedings of Interspeech 2016
Cross-lingual projection for class-based language models

Beat Gfeller, Vlad Schogol, Keith Hall

ACL2016
Directly Modeling Voiced and Unvoiced Components in Speech Waveforms by Neural Networks

Keiichi Tokuda, Heiga Zen

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2016), pp. 5640-5644
Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition

Austin Waters, Yevgen Chebotar

Interspeech (2016)
Distributed representation and estimation of WFST-based n-gram models

Cyril Allauzen, Michael Riley, Brian Roark

Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (StatFSM) (2016), pp. 32-41
End-to-End Text-Dependent Speaker Verification

Georg Heigold, Ignacio Moreno, Samy Bengio, Noam M. Shazeer

International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Factored Spatial and Spectral Multichannel Raw Waveform CLDNNs

Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani

International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices

Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak

Proc. Interspeech, San Francisco, CA, USA (2016)
Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection

Ruben Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada
Flatstart-CTC: a new acoustic model training procedure for speech recognition

Andrew Senior, Hasim Sak, Kanishka Rao

ICASSP 2016
GLOBALLY OPTIMIZED LEAST-SQUARES POST-FILTERING FOR MICROPHONE ARRAY SPEECH ENHANCEMENT

Yiteng (Arden) Huang, Alejandro Luebs, Jan Skoglund, W. Bastiaan Kleijn

ICASSP (2016)
High quality agreement-based semi-supervised training data for acoustic modeling

Félix de Chaumont Quitry, Asa Oines, Pedro Moreno, Eugene Weinstein

2016 IEEE Workshop on Spoken Language Technology
Learning Compact Recurrent Neural Networks

Zhiyun Lu, Vikas Sindhwani, Tara Sainath

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
Learning N-gram Language Models from Uncertain Data

Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark

Interspeech (2016)
Learning Personalized Pronunciations for Contact Names Recognition

Tony Bruguier, Fuchun Peng, Francoise Beaufays

Interspeech 2016 (to appear)
Listen, Attend and Spell: A Neural Network for Large Vocabulary Conversational Speech Recognition

William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

ICASSP (2016)
Lower Frame Rate Neural Network Acoustic Models

Golan Pundak, Tara Sainath

Interspeech (2016)
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

Tara N. Sainath, Bo Li

Proc. Interspeech, ISCA (2016) (to appear)
Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN based Statistical Parametric Speech Synthesis

Bo Li, Heiga Zen

Proc. Interspeech, ISCA (2016) (to appear)
Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition

Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani

Proc. Interspeech, ISCA (2016)
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

Hagen Soltau, Hank Liao, Hasim Sak

ArXiv e-prints (2016)
ON PRE-FILTERING STRATEGIES FOR THE GCC-PHAT ALGORITHM

Hong-Goo Kang, Michael Graczyk, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2016 (IWAENC 2016)
On The Compression Of Recurrent Neural Networks With An Application To LVCSR Acoustic Modeling For Embedded Speech Recognition

Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, Ian McGraw

Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
On the Efficient Representation and Execution of Deep Acoustic Models

Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin

Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) (2016)
Personalized Speech Recognition On Mobile Devices

Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Françoise Beaufays, Carolina Parada

Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

Chanwoo Kim, Richard M. Stern

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING,, vol. 24 (2016), pp. 1315-1329
Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Daan van Esch, Kanishka Rao, Mason Chua

Proceedings of InterSpeech 2016 (to appear)
Pynini: A Python library for weighted finite-state grammar compilation

Kyle Gorman

Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (2016), pp. 75-80
Recent Advances in Google Real-time HMM-driven Unit Selection Synthesizer

Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, Hanna Silen

INTERSPEECH 2016, Sep 8-12, San Francisco, USA, pp. 2238-2242
Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction

Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran

Proc. Interspeech, ISCA (2016)
Robust Estimation of Reverberation Time Using Polynomial Roots

Ian Kelly, Francis Boland, Jan Skoglund

AES 60th Conference on Dereverberation and Reverberation of Audio, Music, and Speech, Google Ireland Ltd. (2016)
Selection and Combination of Hypotheses for Dialectal Speech Recognition

Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro J. Moreno

ICASSP 2016
Semantic Model for Fast Tagging of Word Lattices

Leonid Velikovich

IEEE Spoken Language Technology (SLT) Workshop (2016) (to appear)
THE MATCHING-MINIMIZATION ALGORITHM, THE INCA ALGORITHM AND A MATHEMATICAL FRAMEWORK FOR VOICE CONVERSION WITH UNALIGNED CORPORA.

Yannis Agiomyrgiannakis

ICASSP, IEEE (2016)
TTS for Low Resource Languages: A Bangla Synthesizer

Alexander Gutkin, Linne Ha, Martin Jansche, Knot Pipatsrisawat, Richard Sproat

10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, European Language Resources Association (ELRA), Portorož, Slovenia, pp. 2005-2010
Towards Acoustic Model Unification Across Dialects

Austin Waters, Meysam Bastani, Mohamed G. Elfeky, Pedro Moreno, Xavier Velez

2016 IEEE Workshop on Spoken Language Technology
Unsupervised Context Learning For Speech Recognition

Assaf Michaely, Justin Scheiner, Mohammadreza Ghodsi, Petar Aleksic, Zelin Wu

Spoken Language Technology (SLT) Workshop, IEEE (2016)
Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings

Aren Jansen, Herman Kamper, Sharon Goldwater

IEEE Transactions on Audio, Speech, and Language Processing (2016)
Using instantaneous frequency and aperiodicity detection to estimate FO for high-quality speech synthesis

Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen

Proc. ISCA SSW9 (2016), pp. 238-245
VOICE MORPHING THAT IMPROVES TTS QUALITY USING AN OPTIMAL DYNAMIC FREQUENCY WARPING-AND-WEIGHTING TRANSFORM

Yannis Agiomyrgiannakis, Zoe Roupakia

ICASSP, IEEE (2016)
A 6 µW per Channel Analog Biomimetic Cochlear Implant Processor Filterbank Architecture With Across Channels AGC

Guang Wang, Richard F. Lyon, Emmanuel M. Drakakis

IEEE Transactions on Biomedical Circuits and Systems, vol. 9 (2015), pp. 72-86
A Gaussian Mixture Model Layer Jointly Optimized with Discriminative Features within A Deep Neural Network Architecture

Ehsan Variani, Erik McDermott, Georg Heigold

ICASSP, IEEE (2015)
Acoustic Modeling for Speech Synthesis: from HMM to RNN

Heiga Zen

IEEE ASRU, Scottsdale, Arizona, U.S.A. (2015)
Acoustic Modeling in Statistical Parametric Speech Synthesis - From HMM to LSTM-RNN

Heiga Zen

Proc. MLSLP (2015)
Acoustic Modelling with CD-CTC-SMBR LSTM RNNS

Andrew Senior, Hasim Sak, Felix de Chaumont Quitry, Tara N. Sainath, Kanishka Rao

ASRU (2015)
Automatic Gain Control and Multi-style Training for Robust Small-Footprint Keyword Spotting with Deep Neural Networks

Rohit Prabhavalkar, Raziel Alvarez, Carolina Parada, Preetum Nakkiran, Tara Sainath

Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2015), pp. 4704-4708
Automatic Pronunciation Verification for Speech Recognition

Kanishka Rao, Fuchun Peng, Françoise Beaufays

ICASSP (2015)
Bringing Contextual Information to Google Speech Recognition

Petar Aleksic, Mohammadreza Ghodsi, Assaf Michaely, Cyril Allauzen, Keith Hall, Brian Roark, David Rybach, Pedro Moreno

Interspeech 2015, International Speech Communications Association
Composition-based on-the-fly rescoring for salient n-gram biasing

Keith Hall, Eunjoon Cho, Cyril Allauzen, Francoise Beaufays, Noah Coccaro, Kaisuke Nakajima, Michael Riley, Brian Roark, David Rybach, Linda Zhang

Interspeech 2015, International Speech Communications Association
Compressing Deep Neural Networks using a Rank-Constrained Topology

Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada

Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), ISCA (2015), pp. 1473-1477
Context dependent phone models for LSTM RNN acoustic modelling

Andrew W. Senior, Hasim Sak, Izhak Shafran

ICASSP (2015), pp. 4585-4589
Convolutional Neural Networks for Small-Footprint Keyword Spotting

Tara Sainath, Carolina Parada

Interspeech (2015)
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks

Tara Sainath, Oriol Vinyals, Andrew Senior, Hasim Sak

ICASSP (2015)
DETECTION AND SUPPRESSION OF KEYBOARD TRANSIENT NOISE IN AUDIO STREAMS WITH AUXILIARY KEYBED MICROPHONE

Simon Godsill, Herbert Buchner, Jan Skoglund

ICASSP 2015, IEEE
DIRECT-TO-REVERBERANT RATIO ESTIMATION USING A NULL-STEERED BEAMFORMER

James Eaton, Alastair Moore, Patrick Naylor, Jan Skoglund

ICASSP 2015, IEEE
Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, Li Deng

IEEE Signal Processing Magazine, vol. 32 (2015), pp. 35-52
Directly Modeling Speech Waveforms by Neural Networks for Statistical Parametric Speech Synthesis

Keiichi Tokuda, Heiga Zen

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4215-4219
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

Hasim Sak, Andrew W. Senior, Kanishka Rao, Françoise Beaufays

CoRR, vol. abs/1507.06947 (2015)
Fix It Where It Fails: Pronunciation Learning by Mining Error Corrections from Speech Logs

Zhenzhen Kou, Daisy Stanton, Fuchun Peng, Françoise Beaufays, Trevor Strohman

ICASSP (2015)
Garbage Modeling for On-device Speech Recognition

Christophe Van Gysel, Leonid Velikovich, Ian McGraw, Françoise Beaufays

Interspeech 2015, International Speech Communications Association (to appear)
Geo-location for Voice Search Language Modeling

Ciprian Chelba, Xuedong Zhang, Keith Hall

Interspeech 2015, International Speech Communications Association, pp. 1438-1442
Grapheme-to-Phoneme Conversion Using Long Short-Term Memory Recurrent Neural Networks

Kanishka Rao, Fuchun Peng, Hasim Sak, Françoise Beaufays

ICASSP (2015)
Improved recognition of contact names in voice commands

Petar Aleksic, Cyril Allauzen, David Elson, Aleks Kracun, Diego Melendo Casado, Pedro J. Moreno

ICASSP 2015
Language Modeling in the Era of Abundant Data

Ciprian Chelba

Stanford Information Theory Forum (2015)
Large Vocabulary Automatic Speech Recognition for Children

Hank Liao, Golan Pundak, Olivier Siohan, Melissa Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew Senior, Françoise Beaufays, Michiel Bacchiani

Interspeech (2015)
Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR

Arun Narayanan, Ananya Misra, Kean Chin

INTERSPEECH-2015, ISCA, pp. 3571-3575
Learning acoustic frame labeling for speech recognition with recurrent neural networks

Hasim Sak, Andrew W. Senior, Kanishka Rao, Ozan Irsoy, Alex Graves, Françoise Beaufays, Johan Schalkwyk

ICASSP (2015), pp. 4280-4284
Learning the Speech Front-end with Raw Waveform CLDNNs

Tara Sainath, Ron J. Weiss, Kevin Wilson, Andrew W. Senior, Oriol Vinyals

Interspeech (2015)
Listen, Attend and Spell

William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

CoRR, vol. abs/1508.01211 (2015)
Locally-Connected and Convolutional Neural Networks for Small Footprint Speaker Recognition

Yu-hsin Chen, Ignacio Lopez Moreno, Tara Sainath, Mirkó Visontai, Raziel Alvarez, Carolina Parada

Interspeech (2015)
Long Short-Term Memory Language Models with Additive Morphological Features for Automatic Speech Recognition

Daniel Renshaw, Keith B. Hall

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2015)
Multi-Dialectical Languages Effect on Speech Recognition

Mohamed Elfeky, Pedro J. Moreno, Victor Soto

International Conference on Natural Language and Speech Processing (2015)
Multitask learning and system combination for automatic speech recognition

Olivier Siohan, David Rybach

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
Pruning Sparse Non-negative Matrix N-gram Language Models

Joris Pelemans, Noam M. Shazeer, Ciprian Chelba

Proceedings of Interspeech 2015, ISCA, pp. 1433-1437
Query-by-Example Keyword Spotting Using Long Short-Term Memory Networks

Guoguo Chen, Carolina Parada, Tara N. Sainath

ICASSP (2015)
Rapid Vocabulary Addition to Context-Dependent Decoder Graphs

Cyril Allauzen, Michael Riley

Interspeech 2015
Sequence-based Class Tagging for Robust Transcription in ASR

Lucy Vasserman, Vlad Schogol, Keith Hall

Interspeech 2015, International Speech Communications Association (to appear)
Sound source separation algorithm using phase difference and angle distribution modeling near the target

Chanwoo Kim, Kean Chin

INTERSPEECH 2015, pp. 751-755
Sparse Non-negative Matrix Language Modeling for Geo-annotated Query Session Data

Ciprian Chelba, Noam M. Shazeer

Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Proceedings, IEEE, to appear (to appear)
Speaker Location and Microphone Spacing Invariant Acoustic Modeling from Raw Multichannel Waveforms

Tara N. Sainath, Ron J. Weiss, Kevin Wilson, Arun Narayanan, Michiel Bacchiani, Andrew Senior

ASRU (2015)
Speech Acoustic Modeling from Raw Multichannel Waveforms

Yedid Hoshen, Ron Weiss, Kevin W Wilson

International Conference on Acoustics, Speech, and Signal Processing, IEEE (2015)
Statistical parametric speech synthesis: from HMM to LSTM-RNN

Heiga Zen

RTTH Summer School on Speech Technology -- A Deep Learning Perspective, Barcelona, Spain (2015)
Telluride Decoding Toolbox

Sahar Akram, Alain de Cheveigné, Peter Udo Diehl, Emily Graber, Carina Graversen, Jens Hjortkjaer, Nima Mesgarani, Lucas Parra, Ulrich Pomper, Shihab Shamma, Jonathan Simon, Malcolm Slaney, Daniel Wong

Institute for Neuroinformatics (2015)
Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis

Heiga Zen, Hasim Sak

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2015), pp. 4470-4474
ViSQOL: an objective speech quality model

Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

EURASIP Journal on Audio, Speech, and Music Processing, vol. 2015 (13) (2015), pp. 1-18
Vocaine the Vocoder and Applications in Speech Synthesis

Yannis Agiomyrgiannakis

ICASSP, IEEE (2015) (to appear)
A big data approach to acoustic model training corpus selection

Olga Kapralova, John Alex, Eugene Weinstein, Pedro Moreno, Olivier Siohan

Conference of the International Speech Communication Association (Interspeech) (2014)
An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

Alastair H Moore, Patrick A Naylor, Jan Skoglund

Proceedings of European Signal Processing Conference (EUSIPCO) 2014
Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks

Georg Heigold, Erik McDermott, Vincent Vanhoucke, Andrew Senior, Michiel Bacchiani

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Firenze, Italy (2014)
Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks: Towards Big Data

Erik McDermott, Georg Heigold, Pedro Moreno, Andrew Senior, Michiel Bacchiani

Interspeeech, ISCA (2014)
Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition

M. Bacchiani, A. Senior, G. Heigold

Proceedings of the European Conference on Speech Communication and Technology (2014) (to appear)
Automatic Language Identification Using Deep Neural Networks

Ignacio Lopez-Moreno, Javier Gonzalez-Dominguez, Oldrich Plchot

Proc. ICASSP, IEEE (2014)
Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks

Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Hasim Sak

Interspeech (2014)
Autoregressive Product of Multi-frame Predictions Can Improve the Accuracy of Hybrid Models

Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton

Proceedings of Interspeech 2014
Backoff Inspired Features for Maximum Entropy Language Models

Fadi Biadsy, Keith Hall, Pedro Moreno, Brian Roark

Proceedings of Interspeech, ISCA (2014)
Computer-aided quality assurance of an Icelandic pronunciation dictionary

Martin Jansche

LREC 2014, Reykjavik
Context Dependent State Tying for Speech Recognition using Deep Neural Network Acoustic Models

M. Bacchiani, D. Rybach

Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)
Deep Mixture Density Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

Heiga Zen, Andrew Senior

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2014), pp. 3872-3876
Deep Neural Networks for Small Footprint Text-dependent Speaker Verification

Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, Javier Gonzalez-Dominguez

Proc. ICASSP, IEEE (2014)
Direct construction of compact context-dependency transducers from data

David Rybach, Michael Riley, Chris Alberti

Computer Speech & Language, vol. 28 (2014), pp. 177-191
Discriminative pronunciation modeling for dialectal speech recognition

Maider Lehr, Kyle Gorman, Izhak Shafran

Proc. Interspeech (2014) (to appear)
Encoding Linear Models As Weighted Finite-State Transducers

Ke Wu, Cyril Allauzen, Keith Hall, Michael Riley, Brian Roark

Interspeech 2014, ISCA, pp. 1258-1262
Fine Context, Low-rank, Softplus Deep Neural Networks for Mobile Speech Recognition

Andrew Senior, Xin Lei

Proc. ICASSP (2014) (to appear)
Frame by Frame Language Identification in Short Utterances using Deep Neural Networks

Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez

Neural Networks Special Issue: Neural Network Learning in Big Data (2014)
GMM-Free DNN Training

A. Senior, G. Heigold, M. Bacchiani, H. Liao

Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)
Improving DNN Speaker Independence with I-vector Inputs

Andrew Senior, Ignacio Lopez-Moreno

Proc. ICASSP, IEEE (2014)
JustSpeak: Enabling Universal Voice Control on Android

Yu Zhong, T. V. Raman, Casey Burkhardt, Fadi Biadsy, Jeffrey P. Bigham

W4A 2014
Large-Scale Speaker Identification

Ludwig Schmidt, Matthew Sharifi, Ignacio Lopez-Moreno

Proc. ICASSP, IEEE (2014)
Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition

Hasim Sak, Andrew W. Senior, Françoise Beaufays

CoRR, vol. abs/1402.1128 (2014)
Long short-term memory recurrent neural network architectures for large scale acoustic modeling

Hasim Sak, Andrew W. Senior, Françoise Beaufays

INTERSPEECH (2014), pp. 338-342
Pronunciation Learning for Named-Entities through Crowd-Sourcing

Attapol Rutherford, Fuchun Peng, Françoise Beaufays

Proceedings of Interspeech (2014)
Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression

Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern

INTERSPEECH (2014), pp. 2715-2718
Robust speech recognition using temporal masking and thresholding algorithm

Chanwoo Kim, Kean Chin, Michiel Bacchiani, R. M. Stern

INTERSPEECH-2014, pp. 2734-2738
Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks

Hasim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao

Interspeech (2014)
Sinusoidal Interpolation Across Missing Data

W. Bastiaan Kleijn, Turaj Zakizadeh Shabestary, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2014 (IWAENC 2014), pp. 71-75
Small-Footprint Keyword Spotting using Deep Neural Networks

Guoguo Chen, Carolina Parada, Georg Heigold

ICASSP, IEEE (2014)
Statistical Parametric Speech Synthesis

Heiga Zen

UKSpeech Conference, Edinburgh, UK (2014)
Text-To-Speech with cross-lingual Neural Network-based grapheme-to-phoneme models

Xavi Gonzalvo, Monika Podsiadlo

Proceedings of Interspeech, ISCA (2014)
Training Data Selection Based On Context-Dependent State Matching

Olivier Siohan

Proceedings of ICASSP 2014
Word Embeddings for Speech Recognition

Samy Bengio, Georg Heigold

Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech (2014)
A FREQUENCY-WEIGHTED POST-FILTERING TRANSFORM FOR COMPENSATION OF THE OVER-SMOOTHING EFFECT IN HMM-BASED SPEECH SYNTHESIS

Yannis Agiomyrgiannakis, Florian Eyben

ICASSP, IEEE (2013)
Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices

Xin Lei, Andrew Senior, Alexander Gruenstein, Jeffrey Sorensen

Interspeech (2013)
An Empirical study of learning rates in deep neural networks for speech recognition

Andrew Senior, Georg Heigold, Marc'aurelio Ranzato, Ke Yang

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013) (to appear)
Deep Learning in Speech Synthesis

Heiga Zen

8th ISCA Speech Synthesis Workshop, Barcelona, Spain (2013)
Deep Neural Networks with Auxiliary Gaussian Mixture Models for Real-Time Speech Recognition

Xin Lei, Hui Lin, Georg Heigold

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search

Ciprian Chelba, Johan Schalkwyk

Mobile Speech and Advanced Natural Language Solutions, Springer Science+Business Media, New York (2013), pp. 197-229
Language Model Verbalization for Automatic Speech Recognition

Hasim Sak, Françoise Beaufays, Kaisuke Nakajima, Cyril Allauzen

Proc ICASSP, IEEE (2013)
Language Modeling Capitalization

Françoise Beaufays, Brian Strope

Proc ICASSP, IEEE (2013) (to appear)
Large Scale Distributed Acoustic Modeling With Back-off N-grams

Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson

IEEE Transactions on Audio, Speech and Language Processing, vol. 21 (2013), pp. 1158-1169
Large Scale Distributed Acoustic Modeling With Back-off N-grams

Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson

ICSI, Berkeley, California (2013)
Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription

Hank Liao, Erik McDermott, Andrew Senior

ASRU (2013)
Mixture of mixture n-gram language models

Hasim Sak, Cyril Allauzen, Kaisuke Nakajima, Françoise Beaufays

ASRU (2013), pp. 31-36
Monitoring the Effects of Temporal Clipping on VoIP Speech Quality

Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

Interspeech 2013, pp. 1188-1192
Multiframe Deep Neural Networks for Acoustic Modeling

Vincent Vanhoucke, Matthieu Devin, Georg Heigold

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
Multilingual acoustic models using distributed deep neural networks

Georg Heigold, Vincent Vanhoucke, Andrew Senior, Patrick Nguyen, Marc'aurelio Ranzato, Matthieu Devin, Jeff Dean

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)
On Rectified Linear Units For Speech Processing

M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton

38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013)
Pre-Initialized Composition for Large-Vocabulary Speech Recognition

Cyril Allauzen, Michael Riley

Interspeech 2013, 666 – 670
RAPID ADAPTATION FOR MOBILE SPEECH APPLICATIONS

M. Bacchiani

Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2013)
Rate-Distortion Optimization for Multichannel Audio Compression

Minyue Li, Jan Skoglund, W. Bastiaan Kleijn

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Recurrent Neural Networks for Voice Activity Detection

Thad Hughes, Keir Mierle

ICASSP, IEEE (2013), pp. 7378-7382
Robustness of Speech Quality Metrics to Background Noise and Network Degradations: Comparing VISQOL, PESQ and POLQA

Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 3697-3701
Search Results Based N-Best Hypothesis Rescoring With Maximum Entropy Classification

Fuchun Peng, Scott Roy, Ben Shahshahani, Françoise Beaufays

Proceedings of ASRU (2013)
Smoothed marginal distribution constraints for language modeling

Brian Roark, Cyril Allauzen, Michael Riley

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013), pp. 43-52
Speaker Adaptation of Context Dependent Deep Neural Networks

Hank Liao

International Conference of Acoustics, Speech, and Signal Processing. (2013)
Speech and Natural Language: Where Are We Now And Where Are We Headed?

Ciprian Chelba

Mobile Voice Conference, San Francisco (2013)
Statistical Parametric Speech Synthesis Using Deep Neural Networks

Heiga Zen, Andrew Senior, Mike Schuster

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 7962-7966
Written-Domain Language Modeling for Automatic Speech Recognition

Hasim Sak, Yun-hsuan Sung, Françoise Beaufays, Cyril Allauzen

Interspeech (2013)
iVector-based Acoustic Data Selection

Olivier Siohan, Michiel Bacchiani

Proceedings of Interspeech (2013)
Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition

Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke

Proceedings of Interspeech 2012
Buildling adaptive dialogue systems via Bayes-adaptive POMDP

Shaowei Png, Joelle Pineau, B. Chaib-draa

IEEE Journal of Selected Topics in Signal Processing, vol. vol.6(8). 2012. (2012), pp. 917-927
Chapter 17: Uncertainty Decoding, In Virtanen, Singh, & Raj (Eds.) Techniques for Noise Robustness in Automatic Speech Recognition.

Hank Liao

Wiley (2012), pp. 463-485
Continuous Space Discriminative Language Modeling

Puyang Xu, Sanjeev Khudanpur, Maider Lehr, Emily Prud’hommeaux, Nathan Glenn, Damianos Karakos, Brian Roark, Kenji Sagae, Murat Saraclar, Izhak Shafran, Dan Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley

ICASSP 2012
Deep Neural Networks for Acoustic Modeling in Speech Recognition

Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

Signal Processing Magazine (2012)
Distributed Acoustic Modeling with Back-off N-grams

Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson

Proceedings of ICASSP 2012, IEEE, pp. 4129-4132
Distributed Discriminative Language Models for Google Voice Search

Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope

Proceedings of ICASSP 2012, IEEE, pp. 5017-5021
Estimating Word-Stability During Incremental Speech Recognition

Ian McGraw, Alexander Gruenstein

Interspeech (2012)
Exemplar-Based Processing for Speech Recognition: An Overview

Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Dirk Van Compernolle, Kris Demuynck, Jort F. Gemmeke, Jerome R. Bellegarda, Shiva Sundaram

IEEE Signal Process. Mag., vol. 29 (2012), pp. 98-113
Google's Cross-Dialect Arabic Voice Search

Fadi Biadsy, Pedro J. Moreno, Martin Jansche

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 4441-4444
Hallucinated N-Best Lists for Discriminative Language Modeling

Kenji Sagae, Maider Lehr, Emily Tucker Prud’hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Murat Saraçlar, Izhak Shafran, Daniel M. Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hassler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)
Haptic Voice Recognition Grand Challenge

K. Sim, S. Zhao, K. Yu, H. Liao

14th ACM International Conference on Multimodal Interaction. (2012)
IMPROVED PREDICTION OF NEARLY-PERIODIC SIGNALS

Bastiaan Kleijn, Jan Skoglund

International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
Investigations on Exemplar-Based Features for Speech Recognition Towards Thousands of Hours of Unsupervised, Noisy Data

Georg Heigold, Patrick Nguyen, Mitchel Weintraub, Vincent Vanhoucke

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Kyoto, Japan (2012), pp. 4437-4440
Japanese and Korean Voice Search

Mike Schuster, Kaisuke Nakajima

International Conference on Acoustics, Speech and Signal Processing, IEEE (2012), pp. 5149-5152
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice

Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Leif Johnson, Michael Riley, Peng Xu, Preethi Jyothi, Thorsten Brants, Vida Ha, Will Neveitt

University of Toronto (2012)
Large Scale Language Modeling in Automatic Speech Recognition

Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar

Google (2012)
Large-scale Discriminative Language Model Reranking for Voice Search

Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope

Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Association for Computational Linguistics, pp. 41-49
Learning improved linear transforms for speech recognition

Andrew Senior, Youngmin Cho, Jason Weston

ICASSP, IEEE (2012)
Music Models for Music-Speech Separation

Thad Hughes, Trausti Kristjansson

ICASSP, IEEE (2012), pp. 4917-4920
Optimal Size, Freshness and Time-frame for Voice Search Vocabulary

Maryam Kamvar, Ciprian Chelba

Google (2012)
Recognition of Multilingual Speech in Mobile Applications

Hui Lin, Jui-Ting Huang, Francoise Beaufays, Brian Strope, Yun-hsuan Sung

ICASSP (2012)
Recurrent Neural Networks for Noise Reduction in Robust ASR

Andrew Maas, Quoc V. Le, Tyler M. O’Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng

INTERSPEECH (2012)
Semi-supervised Discriminative Language Modeling for Turkish ASR

Murat Saraçlar, Daniel M. Bikel, Keith Hall, Kenji Sagae

2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, IEEE, Kyoto, Japan
Spectral Intersections for Non-Stationary Signal Separation

Trausti Kristjansson, Thad Hughes

Proceedings of InterSpeech 2012, Portland, OR
Speech/Nonspeech Segmentation in Web Videos

Ananya Misra

Proceedings of InterSpeech 2012
VISQOL: THE VIRTUAL SPEECH QUALITY OBJECTIVE LISTENER

Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)
Voice Query Refinement

Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk

Interspeech (2012)
A Web-Based Tool for Developing Multilingual Pronunciation Lexicons

Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa

12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 3331-3332
Bayesian Language Model Interpolation for Mobile Speech Input

Cyril Allauzen, Michael Riley

Interspeech 2011, pp. 1429-1432
Deploying Google Search by Voice in Cantonese

Yun-hsuan Sung, Martin Jansche, Pedro Moreno

12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 2865-2868
Discriminative Features for Language Identification

C. Alberti, M. Bacchiani

INTERSPEECH (2011)
Improving the speed of neural networks on CPUs

Vincent Vanhoucke, Andrew Senior, Mark Z. Mao

Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice

Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Michael Riley, Peng Xu, Thorsten Brants, Vida Ha, Will Neveitt

OGI/OHSU Seminar Series, Portland, Oregon, USA (2011)
Recognizing English Queries in Mandarin Voice Search

Hung-An Chang, Yun-hsuan Sung, Brian Strope, Francoise Beaufays

ICASSP (2011)
Speech Retrieval

Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran, Murat Saraçlar

Spoken Language Understanding, John Wiley and Sons, Ltd (2011), pp. 417-446
Summary of Opus listening test results

Christian Hoene, Jean-Marc Valin, Koen Vos, Jan Skoglund

IETF, IETF (2011)
TechWare: Mobile Media Search Resources [Best of the Web]

Z. Liu, M. Bacchiani

IEEE Signal Processing Magazine, vol. 28 (2011), pp. 142-145
Unsupervised Testing Strategies for ASR

Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei

Interspeech 2011, pp. 1685-1688
Challenges in Automatic Speech Recognition

Ciprian Chelba, Johan Schalkwyk, Michiel Bacchiani

Interspeech 2010
Decision Tree State Clustering with Word and Syllable Features

Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan

Interspeech, ISCA (2010), 2958 – 2961
Discriminative Topic Segmentation of Text and Speech

Mehryar Mohri, Pedro Moreno, Eugene Weinstein

International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)
Google Search by Voice: A Case Study

Johan Schalkwyk, Doug Beeferman, Francoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Garrett, Brian Strope

Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, Springer (2010), pp. 61-90
On-Demand Language Model Interpolation for Mobile Speech Input

Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk

Interspeech (2010), pp. 1812-1815
Search by Voice in Mandarin Chinese

Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno

Interspeech 2010, pp. 354-357
Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models

Francoise Beaufays, Vincent Vanhoucke, Brian Strope

Proc Interspeech (2010)
A new quality measure for topic segmentation of text and speech

Mehryar Mohri, Pedro J. Moreno, Eugene Weinstein

Conference of the International Speech Communication Association (Interspeech) (2009)
Restoring Punctuation and Capitalization in Transcribed Speech

Agustín Gravano, Martin Jansche, Michiel Bacchiani

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4741-4744
Revisiting Graphemes with Increasing Amounts of Data

Yun-Hsuan Sung, Thad Hughes, Francoise Beaufays, Brian Strope

ICASSP, IEEE (2009)
Web-derived Pronunciations

Arnab Ghoshal, Martin Jansche, Sanjeev Khudanpur, Michael Riley, Morgan Ulinski

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4289-4292
Confidence Scores for Acoustic Model Adaptation

C. Gollan, M. Bacchiani

Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2008)
Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing

Michiel Bacchiani, Francoise Beaufays, Johan Schalkwyk, Mike Schuster, Brian Strope

Proc. ICASSP (2008)
Retrieval and Browsing of Spoken Content

Ciprian Chelba, Timothy J. Hazen, Murat Saraçlar

Signal Processing Magazine, IEEE, vol. 25 (2008), pp. 39-49
Speech Recognition with Weighted Finite-State Transducers

Mehryar Mohri, Fernando C. N. Pereira, Michael Riley

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
Speech Recognition with Weighted Finite-State Transducers

Mehryar Mohri, Fernando C. N. Pereira, Michael Riley

Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)