Speech Processing

102 Publications

  •  

    An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

    Alastair H Moore, Patrick A Naylor, Jan Skoglund

    Proceedings of European Signal Processing Conference (EUSIPCO) 2014

  •    

    Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks

    Georg Heigold, Erik McDermott, Vincent Vanhoucke, Andrew Senior, Michiel Bacchiani

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Firenze, Italy (2014)

  •   

    Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition

    M. Bacchiani, A. Senior, G. Heigold

    Proceedings of the European Conference on Speech Communication and Technology (2014) (to appear)

  •    

    Automatic Language Identification Using Deep Neural Networks

    Ignacio Lopez-Moreno, Javier Gonzalez-Dominguez, Oldrich Plchot

    Proc. ICASSP, IEEE (2014)

  •   

    Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks

    Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Hasim Sak

    Interspeech (2014)

  •    

    Autoregressive Product of Multi-frame Predictions Can Improve the Accuracy of Hybrid Models

    Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton

    Proceedings of Interspeech 2014

  •    

    Computer-aided quality assurance of an Icelandic pronunciation dictionary

    Martin Jansche

    LREC 2014, Reykjavik

  •   

    Context Dependent State Tying for Speech Recognition using Deep Neural Network Acoustic Models

    M. Bacchiani, D. Rybach

    Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)

  •    

    Deep Mixture Density Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis

    Heiga Zen, Andrew Senior

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2014), pp. 3872-3876

  •   

    Deep Neural Networks for Small Footprint Text-dependent Speaker Verification

    Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, Javier Gonzalez-Dominguez

    Proc. ICASSP, IEEE (2014)

  •    

    Discriminative pronunciation modeling for dialectal speech recognition

    Maider Lehr, Kyle Gorman, Izhak Shafran

    Proc. Interspeech (2014) (to appear)

  •   

    Fine Context, Low-rank, Softplus Deep Neural Networks for Mobile Speech Recognition

    Andrew Senior, Xin Lei

    Proc. ICASSP (2014) (to appear)

  •    

    Frame by Frame Language Identification in Short Utterances using Deep Neural Networks

    Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez

    Neural Networks Special Issue: Neural Network Learning in Big Data (2014) (to appear)

  •   

    GMM-Free DNN Training

    A. Senior, G. Heigold, M. Bacchiani, H. Liao

    Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2014)

  •    

    Improving DNN Speaker Independence with I-vector Inputs

    Andrew Senior, Ignacio Lopez-Moreno

    Proc. ICASSP, IEEE (2014)

  •    

    JustSpeak: Enabling Universal Voice Control on Android

    Yu Zhong, T. V. Raman, Casey Burkhardt, Fadi Biadsy, Jeffrey P. Bigham

    W4A 2014

  •    

    Large-Scale Speaker Identification

    Ludwig Schmidt, Matthew Sharifi, Ignacio Lopez-Moreno

    Proc. ICASSP, IEEE (2014)

  •   

    Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks

    Hasim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao

    Interspeech (2014) (to appear)

  •  

    Sinusoidal Interpolation Across Missing Data

    W. Bastiaan Kleijn, Turaj Zakizadeh Shabestary, Jan Skoglund

    International Workshop on Acoustic Signal Enhancement 2014 (IWAENC 2014), pp. 71-75

  •   

    Small-Footprint Keyword Spotting using Deep Neural Networks

    Guoguo Chen, Carolina Parada, Georg Heigold

    ICASSP, IEEE (2014)

  •    

    Statistical Parametric Speech Synthesis

    Heiga Zen

    UKSpeech Conference, Edinburgh, UK (2014)

  •    

    Training Data Selection Based On Context-Dependent State Matching

    Olivier Siohan

    Proceedings of ICASSP 2014 (to appear)

  •    

    Word Embeddings for Speech Recognition

    Samy Bengio, Georg Heigold

    Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech (2014) (to appear)

  •   

    Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices

    Xin Lei, Andrew Senior, Alexander Gruenstein, Jeffrey Sorensen

    Interspeech (2013)

  •   

    An Empirical study of learning rates in deep neural networks for speech recognition

    Andrew Senior, Georg Heigold, Marc'aurelio Ranzato, Ke Yang

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013) (to appear)

  •    

    Deep Learning in Speech Synthesis

    Heiga Zen

    8th ISCA Speech Synthesis Workshop, Barcelona, Spain (2013)

  •  

    Deep Neural Networks with Auxiliary Gaussian Mixture Models for Real-Time Speech Recognition

    Xin Lei, Hui Lin, Georg Heigold

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)

  •    

    Direct construction of compact context-dependency transducers from data

    David Rybach, Michael Riley, Chris Alberti

    Computer Speech & Language (2013) (to appear)

  •    

    Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search

    Ciprian Chelba, Johan Schalkwyk

    Mobile Speech and Advanced Natural Language Solutions, Springer Science+Business Media, New York (2013), pp. 197-229

  •   

    Language Model Verbalization for Automatic Speech Recognition

    Hasim Sak, Françoise Beaufays, Kaisuke Nakajima, Cyril Allauzen

    Proc ICASSP, IEEE (2013) (to appear)

  •   

    Language Modeling Capitalization

    Françoise Beaufays, Brian Strope

    Proc ICASSP, IEEE (2013) (to appear)

  •    

    Large Scale Distributed Acoustic Modeling With Back-off N-grams

    Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson

    IEEE Transactions on Audio, Speech and Language Processing, vol. 21 (2013), pp. 1158-1169

  •   

    Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription

    Hank Liao, Erik McDermott, Andrew Senior

    ASRU (2013)

  •  

    Monitoring the Effects of Temporal Clipping on VoIP Speech Quality

    Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

    Interspeech 2013, pp. 1188-1192

  •   

    Multiframe Deep Neural Networks for Acoustic Modeling

    Vincent Vanhoucke, Matthieu Devin, Georg Heigold

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)

  •   

    Multilingual acoustic models using distributed deep neural networks

    Georg Heigold, Vincent Vanhoucke, Andrew Senior, Patrick Nguyen, Marc'aurelio Ranzato, Matthieu Devin, Jeff Dean

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013)

  •    

    On Rectified Linear Units For Speech Processing

    M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton

    38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013)

  •   

    Pre-Initialized Composition for Large-Vocabulary Speech Recognition

    Cyril Allauzen, Michael Riley

    Interspeech 2013, 666 – 670

  •   

    RAPID ADAPTATION FOR MOBILE SPEECH APPLICATIONS

    M. Bacchiani

    Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2013)

  •   

    Rate-Distortion Optimization for Multichannel Audio Compression

    Minyue Li, Jan Skoglund, W. Bastiaan Kleijn

    2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

  •    

    Recurrent Neural Networks for Voice Activity Detection

    Thad Hughes, Keir Mierle

    ICASSP, IEEE (2013), pp. 7378-7382

  •   

    Robustness of Speech Quality Metrics to Background Noise and Network Degradations: Comparing VISQOL, PESQ and POLQA

    Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 3697-3701

  •    

    Smoothed marginal distribution constraints for language modeling

    Brian Roark, Cyril Allauzen, Michael Riley

    Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013), pp. 43-52

  •   

    Speaker Adaptation of Context Dependent Deep Neural Networks

    Hank Liao

    International Conference of Acoustics, Speech, and Signal Processing. (2013)

  •    

    Speech and Natural Language: Where Are We Now And Where Are We Headed?

    Ciprian Chelba

    Mobile Voice Conference, San Francisco (2013)

  •    

    Statistical Parametric Speech Synthesis Using Deep Neural Networks

    Heiga Zen, Andrew Senior, Mike Schuster

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE (2013), pp. 7962-7966

  •  

    Written-Domain Language Modeling for Automatic Speech Recognition

    Hasim Sak, Yun-hsuan Sung, Françoise Beaufays, Cyril Allauzen

    Interspeech (2013)

  •   

    iVector-based Acoustic Data Selection

    Olivier Siohan, Michiel Bacchiani

    Proceedings of Interspeech (2013)

  •    

    Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition

    Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke

    Proceedings of Interspeech 2012

  •    

    Buildling adaptive dialogue systems via Bayes-adaptive POMDP

    Shaowei Png, Joelle Pineau, B. Chaib-draa

    IEEE Journal of Selected Topics in Signal Processing, vol. vol.6(8). 2012. (2012), pp. 917-927

  •  

    Chapter 17: Uncertainty Decoding, In Virtanen, Singh, & Raj (Eds.) Techniques for Noise Robustness in Automatic Speech Recognition.

    Hank Liao

    Wiley (2012), pp. 463-485

  •   

    Continuous Space Discriminative Language Modeling

    Puyang Xu, Sanjeev Khudanpur, Maider Lehr, Emily Prud’hommeaux, Nathan Glenn, Damianos Karakos, Brian Roark, Kenji Sagae, Murat Saraclar, Izhak Shafran, Dan Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley

    ICASSP 2012

  •    

    Deep Neural Networks for Acoustic Modeling in Speech Recognition

    Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

    Signal Processing Magazine (2012)

  •    

    Distributed Acoustic Modeling with Back-off N-grams

    Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson

    Proceedings of ICASSP 2012, IEEE, pp. 4129-4132

  •    

    Distributed Discriminative Language Models for Google Voice Search

    Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope

    Proceedings of ICASSP 2012, IEEE, pp. 5017-5021

  •   

    Estimating Word-Stability During Incremental Speech Recognition

    Ian McGraw, Alexander Gruenstein

    Interspeech (2012)

  •    

    Google's Cross-Dialect Arabic Voice Search

    Fadi Biadsy, Pedro J. Moreno, Martin Jansche

    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 4441-4444

  •   

    Hallucinated N-Best Lists for Discriminative Language Modeling

    Kenji Sagae, Maider Lehr, Emily Tucker Prud’hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Murat Saraçlar, Izhak Shafran, Daniel M. Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hassler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)

  •   

    Haptic Voice Recognition Grand Challenge

    K. Sim, S. Zhao, K. Yu, H. Liao

    14th ACM International Conference on Multimodal Interaction. (2012)

  •   

    IMPROVED PREDICTION OF NEARLY-PERIODIC SIGNALS

    Bastiaan Kleijn, Jan Skoglund

    International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)

  •    

    Investigations on Exemplar-Based Features for Speech Recognition Towards Thousands of Hours of Unsupervised, Noisy Data

    Georg Heigold, Patrick Nguyen, Mitchel Weintraub, Vincent Vanhoucke

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Kyoto, Japan (2012), pp. 4437-4440

  •   

    Japanese and Korean Voice Search

    Mike Schuster, Kaisuke Nakajima

    International Conference on Acoustics, Speech and Signal Processing, IEEE (2012), pp. 5149-5152

  •    

    Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice

    Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Leif Johnson, Michael Riley, Peng Xu, Preethi Jyothi, Thorsten Brants, Vida Ha, Will Neveitt

    University of Toronto (2012)

  •    

    Large Scale Language Modeling in Automatic Speech Recognition

    Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar

    Google (2012)

  •    

    Large-scale Discriminative Language Model Reranking for Voice Search

    Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope

    Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, Association for Computational Linguistics, pp. 41-49

  •    

    Learning improved linear transforms for speech recognition

    Andrew Senior, Youngmin Cho, Jason Weston

    ICASSP, IEEE (2012)

  •   

    Music Models for Music-Speech Separation

    Thad Hughes, Trausti Kristjansson

    ICASSP, IEEE (2012), pp. 4917-4920

  •    

    Optimal Size, Freshness and Time-frame for Voice Search Vocabulary

    Maryam Kamvar, Ciprian Chelba

    Google (2012)

  •    

    Recognition of Multilingual Speech in Mobile Applications

    Hui Lin, Jui-Ting Huang, Francoise Beaufays, Brian Strope, Yun-hsuan Sung

    ICASSP (2012)

  •   

    Semi-supervised Discriminative Language Modeling for Turkish ASR

    Murat Saraçlar, Daniel M. Bikel, Keith Hall, Kenji Sagae

    2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, IEEE, Kyoto, Japan

  •    

    Spectral Intersections for Non-Stationary Signal Separation

    Trausti Kristjansson, Thad Hughes

    Proceedings of InterSpeech 2012, Portland, OR

  •    

    Speech/Nonspeech Segmentation in Web Videos

    Ananya Misra

    Proceedings of InterSpeech 2012

  •   

    VISQOL: THE VIRTUAL SPEECH QUALITY OBJECTIVE LISTENER

    Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte

    International Workshop on Acoustic Signal Enhancement 2012 (IWAENC2012)

  •   

    Voice Query Refinement

    Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk

    Interspeech (2012)

  •    

    A Web-Based Tool for Developing Multilingual Pronunciation Lexicons

    Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa

    12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 3331-3332

  •    

    Bayesian Language Model Interpolation for Mobile Speech Input

    Cyril Allauzen, Michael Riley

    Interspeech 2011, pp. 1429-1432

  •    

    Deploying Google Search by Voice in Cantonese

    Yun-hsuan Sung, Martin Jansche, Pedro Moreno

    12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 2865-2868

  •   

    Discriminative Features for Language Identification

    C. Alberti, M. Bacchiani

    INTERSPEECH (2011)

  •    

    Improving the speed of neural networks on CPUs

    Vincent Vanhoucke, Andrew Senior, Mark Z. Mao

    Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011

  •    

    Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice

    Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Michael Riley, Peng Xu, Thorsten Brants, Vida Ha, Will Neveitt

    OGI/OHSU Seminar Series, Portland, Oregon, USA (2011)

  •   

    Recognizing English Queries in Mandarin Voice Search

    Hung-An Chang, Yun-hsuan Sung, Brian Strope, Francoise Beaufays

    ICASSP (2011)

  •   

    Speech Retrieval

    Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran, Murat Saraçlar

    Spoken Language Understanding, John Wiley and Sons, Ltd (2011), pp. 417-446

  •    

    Summary of Opus listening test results

    Christian Hoene, Jean-Marc Valin, Koen Vos, Jan Skoglund

    IETF, IETF (2011)

  •   

    TechWare: Mobile Media Search Resources [Best of the Web]

    Z. Liu, M. Bacchiani

    IEEE Signal Processing Magazine, vol. 28 (2011), pp. 142-145

  •   

    Unsupervised Testing Strategies for ASR

    Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei

    Interspeech 2011, pp. 1685-1688

  •    

    Challenges in Automatic Speech Recognition

    Ciprian Chelba, Johan Schalkwyk, Michiel Bacchiani

    Interspeech 2010

  •    

    Decision Tree State Clustering with Word and Syllable Features

    Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan

    Interspeech, ISCA (2010), 2958 – 2961

  •   

    Discriminative Topic Segmentation of Text and Speech

    Mehryar Mohri, Pedro Moreno, Eugene Weinstein

    International Conference on Artificial Intelligence and Statistics (AISTATS) (2010)

  •   

    Google Search by Voice: A Case Study

    Johan Schalkwyk, Doug Beeferman, Francoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Garrett, Brian Strope

    Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, Springer (2010), pp. 61-90

  •    

    On-Demand Language Model Interpolation for Mobile Speech Input

    Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk

    Interspeech (2010), pp. 1812-1815

  •    

    Search by Voice in Mandarin Chinese

    Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno

    Interspeech 2010, pp. 354-357

  •    

    Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models

    Francoise Beaufays, Vincent Vanhoucke, Brian Strope

    Proc Interspeech (2010)

  •   

    A new quality measure for topic segmentation of text and speech

    Mehryar Mohri, Pedro J. Moreno, Eugene Weinstein

    Conference of the International Speech Communication Association (Interspeech) (2009)

  •    

    Restoring Punctuation and Capitalization in Transcribed Speech

    Agustín Gravano, Martin Jansche, Michiel Bacchiani

    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4741-4744

  •   

    Revisiting Graphemes with Increasing Amounts of Data

    Yun-Hsuan Sung, Thad Hughes, Francoise Beaufays, Brian Strope

    ICASSP, IEEE (2009)

  •    

    Web-derived Pronunciations

    Arnab Ghoshal, Martin Jansche, Sanjeev Khudanpur, Michael Riley, Morgan Ulinski

    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4289-4292

  •   

    Confidence Scores for Acoustic Model Adaptation

    C. Gollan, M. Bacchiani

    Proceedings of the International Conference on Acoustics,Speech and Signal Processing (2008)

  •   

    Deploying GOOG-411: Early Lessons in Data, Measurement, and Testing

    Michiel Bacchiani, Francoise Beaufays, Johan Schalkwyk, Mike Schuster, Brian Strope

    Proc. ICASSP (2008)

  •   

    Retrieval and Browsing of Spoken Content

    Ciprian Chelba, Timothy J. Hazen, Murat Saraçlar

    Signal Processing Magazine, IEEE, vol. 25 (2008), pp. 39-49

  •   

    Speech Recognition with Weighted Finite-State Transducers

    Mehryar Mohri, Fernando C. N. Pereira, Michael Riley

    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)

  •   

    Speech Recognition with Weighted Finite-State Transducers

    Mehryar Mohri, Fernando C. N. Pereira, Michael Riley

    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)