Tara N. Sainath

I received my PhD in Electrical Engineering and Computer Science from MIT in 2009. The main focus of my PhD work was in acoustic modeling for noise robust speech recognition. After my PhD, I spent 5 years at the Speech and Language Algorithms group at IBM T.J. Watson Research Center, before joining Google Research. I have co-organized a special session on Sparse Representations at Interspeech 2010 in Japan. I have also organized a special session on Deep Learning at ICML 2013 in Atlanta. In addition, I am a staff reporter for the IEEE Speech and Language Processing Technical Committee (SLTC) Newsletter. My research interests are mainly in acoustic modeling, including deep neural networks, sparse representations and adaptation methods.

Google Publications

Previous Publications

  •  

    Deep Convolutional Neural Networks for Large-scale Speech Tasks

    Tara N. Sainath, Brian Kingsbury, George Saon, Hagen Soltau, Abdel-rahman Mohamed, George E. Dahl, Bhuvana Ramabhadran

    Neural Networks, vol. 64 (2015), pp. 39-48

  •  

    Deep Scattering Spectrum with deep neural networks

    Vijayaditya Peddinti, Tara N. Sainath, Shay Maymon, Bhuvana Ramabhadran, David Nahamoo, Vaibhava Goel

    ICASSP (2014), pp. 210-214

  •  

    Deep scattering spectra with deep neural networks for LVCSR tasks

    Tara N. Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran, David Nahamoo

    INTERSPEECH (2014), pp. 900-904

  •  

    Improvements to filterbank and delta learning within a deep neural network framework

    Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George Saon, Bhuvana Ramabhadran

    ICASSP (2014), pp. 6839-6843

  •  

    Joint training of convolutional and non-convolutional neural networks

    Hagen Soltau, George Saon, Tara N. Sainath

    ICASSP (2014), pp. 5572-5576

  •  

    Parallel Deep Neural Network Training for Big Data on Blue Gene/Q

    I-Hsin Chung, Tara N. Sainath, Bhuvana Ramabhadran, Michael Picheny, John A. Gunnels, Vernon Austel, Upendra V. Chaudhari, Brian Kingsbury

    SC (2014), pp. 745-753

  •  

    Parallel deep neural network training for LVCSR tasks using blue gene/Q

    Tara N. Sainath, I-Hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John A. Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra V. Chaudhari

    INTERSPEECH (2014), pp. 1048-1052

  •  

    Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling

    Tara N. Sainath, Lior Horesh, Brian Kingsbury, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

    ASRU (2013), pp. 303-308

  •   

    An Evaluation of Posterior Modeling Techniques for Phonetic Recognition

    Rohit Prabhavalkar, Tara N. Sainath, David Nahamoo, Bhuvana Ramabhadran, Dimitri Kanevsky

    Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2013), pp. 7165-7169

  •  

    Deep convolutional neural networks for LVCSR

    Tara N. Sainath, Abdel-rahman Mohamed, Brian Kingsbury, Bhuvana Ramabhadran

    ICASSP (2013), pp. 8614-8618

  •  

    Developing speech recognition systems for corpus indexing under the IARPA Babel program

    Jia Cui, Xiaodong Cui, Bhuvana Ramabhadran, Janice Kim, Brian Kingsbury, Jonathan Mamou, Lidia Mangu, Michael Picheny, Tara N. Sainath, Abhinav Sethy

    ICASSP (2013), pp. 6753-6757

  •  

    Improvements to Deep Convolutional Neural Networks for LVCSR

    Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomás Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

    ASRU (2013), pp. 315-320

  •  

    Improvements to deep convolutional neural networks for LVCSR

    Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomás Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

    CoRR, vol. abs/1309.1501 (2013)

  •  

    Improving deep neural networks for LVCSR using rectified linear units and dropout

    George E. Dahl, Tara N. Sainath, Geoffrey E. Hinton

    ICASSP (2013), pp. 8609-8613

  •  

    Improving training time of Hessian-free optimization for deep neural networks using preconditioning and sampling

    Tara N. Sainath, Lior Horesh, Brian Kingsbury, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

    CoRR, vol. abs/1309.1508 (2013)

  •  

    Learning filter banks within a deep neural network framework

    Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran

    ASRU (2013), pp. 297-302

  •  

    Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks

    Tara N. Sainath, Brian Kingsbury, Hagen Soltau, Bhuvana Ramabhadran

    IEEE Transactions on Audio, Speech & Language Processing, vol. 21 (2013), pp. 2267-2276

  •  

    Auto-encoder bottleneck features using deep belief networks

    Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran

    ICASSP (2012), pp. 4153-4156

  •  

    Deep Neural Networks for Acoustic Modeling in Speech Recognition

    Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, Brian Kingsbury

    Signal Processing Magazine (2012)

  •  

    Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks

    Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran

    INTERSPEECH (2012)

  •  

    Exemplar-Based Processing for Speech Recognition: An Overview

    Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Dirk Van Compernolle, Kris Demuynck, Jort F. Gemmeke, Jerome R. Bellegarda, Shiva Sundaram

    IEEE Signal Process. Mag., vol. 29 (2012), pp. 98-113

  •  

    Improved pre-training of Deep Belief Networks using Sparse Encoding Symmetric Machines

    Christian Plahl, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo

    ICASSP (2012), pp. 4165-4168

  •  

    N-best entropy based data selection for acoustic modeling

    Nobuyasu Itoh, Tara N. Sainath, Dan-Ning Jiang, Jie Zhou, Bhuvana Ramabhadran

    ICASSP (2012), pp. 4133-4136

  •  

    Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization

    Brian Kingsbury, Tara N. Sainath, Hagen Soltau

    INTERSPEECH (2012)

  •  

    A convex hull approach to sparse representations for exemplar-based speech recognition

    Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran, Parikshit M. Shah

    ASRU (2011), pp. 59-64

  •  

    A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization

    Dimitri Kanevsky, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran, Peder A. Olsen

    ICASSP (2011), pp. 5164-5167

  •  

    Application specific loss minimization using gradient boosting

    Bin Zhang 0009, Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran

    ICASSP (2011), pp. 4880-4883

  •  

    Convergence of Line Search A-Function Methods

    Dimitri Kanevsky, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran

    INTERSPEECH (2011), pp. 997-1000

  •  

    Deep Belief Networks using discriminative features for phone recognition

    Abdel-rahman Mohamed, Tara N. Sainath, George E. Dahl, Bhuvana Ramabhadran, Geoffrey E. Hinton, Michael A. Picheny

    ICASSP (2011), pp. 5060-5063

  •  

    Exemplar-based Sparse Representation phone identification features

    Tara N. Sainath, David Nahamoo, Bhuvana Ramabhadran, Dimitri Kanevsky, Vaibhava Goel, Parikshit M. Shah

    ICASSP (2011), pp. 4492-4495

  •  

    Making Deep Belief Networks effective for large vocabulary continuous speech recognition

    Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran, Petr Fousek, Petr Novák, Abdel-rahman Mohamed

    ASRU (2011), pp. 30-35

  •  

    Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition

    Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky

    INTERSPEECH (2011), pp. 785-788

  •  

    A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments

    Seth J. Teller, Matthew R. Walter, Matthew E. Antone, Andrew Correa, Randall Davis, Luke Fletcher, Emilio Frazzoli, Jim Glass, Jonathan P. How, Albert S. Huang, Jeong Hwan Jeon, Sertac Karaman, Brandon Luders, Nicholas Roy, Tara N. Sainath

    ICRA (2010), pp. 526-533

  •  

    An analysis of sparseness and regularization in exemplar-based methods for speech classification

    Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo

    INTERSPEECH (2010), pp. 2842-2845

  •  

    Bayesian compressive sensing for phonetic classification

    Tara N. Sainath, Avishy Carmi, Dimitri Kanevsky, Bhuvana Ramabhadran

    ICASSP (2010), pp. 4370-4373

  •  

    Data selection for language modeling using sparse representations

    Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran, Dimitri Kanevsky

    INTERSPEECH (2010), pp. 2258-2261

  •  

    Incorporating sparse representation phone identification features in automatic speech recognition using exponential families

    Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder A. Olsen, David Nahamoo, Dimitri Kanevsky

    INTERSPEECH (2010), pp. 1345-1348

  •  

    Kalman filtering for compressed sensing

    Dimitri Kanevsky, Avishy Carmi, Lior Horesh, Pini Gurfil, Bhuvana Ramabhadran, Tara N. Sainath

    FUSION (2010), pp. 1-8

  •  

    Sparse representation features for speech recognition

    Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Abhinav Sethy

    INTERSPEECH (2010), pp. 2254-2257

  •  

    Sparse representations for text categorization

    Tara N. Sainath, Sameer Maskey, Dimitri Kanevsky, Bhuvana Ramabhadran, David Nahamoo, Julia Hirschberg

    INTERSPEECH (2010), pp. 2266-2269

  •  

    The Use of isometric transformations and bayesian estimation in compressive sensing for fMRI classification

    Avishy Carmi, Tara N. Sainath, Pini Gurfil, Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran

    ICASSP (2010), pp. 493-496

  •  

    A generalized family of parameter estimation techniques

    Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran

    ICASSP (2009), pp. 1725-1728

  •  

    An exploration of large vocabulary tools for small vocabulary phonetic recognition

    Tara N. Sainath, Bhuvana Ramabhadran, Michael Picheny

    ASRU (2009), pp. 359-364

  •  

    Island-driven search using broad phonetic classes

    Tara N. Sainath

    ASRU (2009), pp. 287-292

  •  

    A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition

    Tara N. Sainath, Victor Zue

    INTERSPEECH (2008), pp. 2378-2381

  •  

    Generalization of extended baum-welch parameter estimation for discriminative training and decoding

    Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo

    INTERSPEECH (2008), pp. 277-280

  •  

    Gradient steepness metrics using extended Baum-Welch transformations for universal pattern recognition tasks

    Tara N. Sainath, Dimitri Kanevsky, Bhuvana Ramabhadran

    ICASSP (2008), pp. 4533-4536

  •  

    Audio classification using extended baum-welch transformations

    Tara N. Sainath, Victor Zue, Dimitri Kanevsky

    INTERSPEECH (2007), pp. 2969-2972

  •  

    Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations

    Tara N. Sainath, Dimitri Kanevsky, Bhuvana Ramabhadran

    ASRU (2007), pp. 306-311

  •  

    Unsupervised Audio Segmentation using Extended Baum-Welch Transformations

    Tara N. Sainath, Dimitri Kanevsky, Giridharan Iyengar

    ICASSP (1) (2007), pp. 209-212

  •  

    A Sinusoidal Model Approach to Acoustic Landmark Detection and Segmentation for Robust Segment-Based Speech Recognition

    Tara N. Sainath, Timothy J. Hazen

    ICASSP (1) (2006), pp. 525-528