Research at Google

About our work

Research in Machine Perception tackles the hard problems of understanding images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality.

Our technology powers products across Alphabet, including image understanding in Search and Google Photos, camera enhancements for the Pixel Phone, handwriting interfaces for Android, optical character recognition for Google Drive, video understanding and summarization for YouTube, Google Cloud, Google Photos and Nest, as well as mobile apps including Motion Stills, PhotoScan and Allo.

We actively contribute to the open source and research communities. Our pioneering deep learning advances, such as Inception and Batch Normalization, are available in TensorFlow. Further, we have released several large-scale datasets for machine learning, including: AudioSet (audio event detection); AVA (human action understanding in video); Open Images (image classification and object detection); and YouTube-8M (video labeling).

Some of Our Projects

Portrait mode, a new feature of the Pixel 2 smartphone, lets you capture shallow depth-of-field pictures of family, friends, and flowers. It runs on both the rear-facing and selfie cameras, even though neither is a dual camera. How is this possible? Lots of machine learning, and some other tricks.

Portrait mode on the Pixel 2 and Pixel 2 XL smartphones

The TensorFlow Object Detection API is an open-source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models, designed to support state-of-the-art models while allowing for rapid exploration and research. We also include a collection of detection models pre-trained on the COCO dataset.

Tensorflow object detection API

Motion Stills, an app for iOS and Android, applies our video stabilization technology to your Live Photos, allowing you to freeze the background into a still photo or create sweeping cinematic pans. Further, you create even more beautiful videos and fun GIFs using motion-tracked text overlays, super-resolution videos, and automatic cinemagraphs.

Motion Stills

Virtual Reality (VR) enables remarkably immersive experiences. However, sharing these experiences with others can be difficult, as VR headsets make it challenging to create a complete picture of the people participating in the experience. To address this problem we reveal the user’s face by virtually “removing” the headset, creating a realistic see-through effect.

Headset “Removal” for Virtual and Mixed Reality

Rapid and Accurate Image Super-Resolution (RAISR), incorporates machine learning to produce high-quality versions of low-resolution images, removing aliasing artifacts that may be present in the original image. RAISR produces results that are comparable to or better than the currently available super-resolution methods, and does so roughly 10 to 100 times faster, allowing it to run on a mobile device in real-time.

Enhance! RAISR Sharp Images with Machine Learning

PhotoScan, an app for iOS and Android, allows you to digitize photo prints with just a smartphone. One of the key features of PhotoScan is the ability to remove glare from prints by using a unique blend of computer vision and image processing techniques.

PhotoScan: Taking Glare-Free Pictures of Pictures

We believe strongly in sharing our findings with the world, fostering an open culture that encourages researchers to disseminate their work among the broader science community.

Publications Blog Posts Datasets

Some of Our Team

Chuck Rosenberg

“Google is unique in being one of the few places in the world where we have the computational resources and the data to make giant strides towards the dream of solving computer vision.”

Tomáš Ižo

“We interpret the world around us through our senses. Our team helps make Google products more helpful and delightful by giving our systems the ability to understand and reason about images, video and sound.”

Apostol (Paul)
Natsev

Ashok
Popat

Caroline
Pantofaru

Chris
Bregler

Jay
Yagnik

Kevin P.
Murphy

Marc
Levoy

Michele
Covell

Peter
Norvig

Peyman
Milanfar

Rahul
Sukthankar

Richard F.
Lyon

Rif A.
Saurous

Sergey
Ioffe

Thomas
Dean

Vittorio
Ferrari

Google Machine Perception Team

About our work

Some of Our Projects

Some of Our Team

Join the Team