The goal of the Google Brain team's machine perception efforts is to improve a machine's ability to hear and see so that machines may naturally interact with humans. Historically, computers have been poor at perceiving visual and audio information that humans are able to process with ease. In the last few years, advances in deep learning have changed this equation substantially and visual and audio recognition systems continue to approach human-level performance.

Our team within Google Brain has focused on building deep learning systems to advance the state of the art in these domains and apply these ideas to real products that affect the quality of user experience. Several notable advances which have stemmed from researchers within our team and the wider Google research community include:

  • Advancing the state-of-the-art for image recognition through steady progress in designing and scaling convolutional neural network architectures [Krizhevsky et al, 2012, Szegedy et al, 2014]. This work has been recognized as the winner of the ImageNet ILSVRC Challenge in 2012 and 2014.
  • Replacing highly handcrafted and hand-tuned speech systems with carefully built component models with deep, recurrent and convolutional neural network architectures that are increasingly being trained end-to-end [Jaitly et al, 2011; Sainath et al 2015, Chan et al, 2016]. Our contribution to the area of end-to-end models for speech recognition has been recognized with the ICASSP 2016 Speech and Language Processing Student Paper Award.
  • Combining machine learning systems with different perceptual modalities to perform unique machine perception tasks, e.g. zero-shot learning or neural image captioning [Frome et al, 2012; Vinyals et al 2015]. The latter work has the distinction of winning the first CoCo Image Captioning Challenge in 2015.

Our long term goal is to make human perception a seamless component of future software systems including mobile devices, robotics and healthcare. While we have made great strides in the last few years, much work is yet to be done and we are excited about future directions.

Some of Our Publications

Publications by Year (Speech)

Publications by Year (Vision)