Google Machine Perception Team

Enabling machines to achieve human-level intelligence at interpreting, reasoning about, and transforming sensory data.

About our work

Research in Machine Perception tackles the hard problems of understanding images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality.

Our technology powers products across Alphabet, including image understanding in Search and Google Photos, camera enhancements for the Pixel Phone, handwriting interfaces for Android, optical character recognition for Google Drive, video understanding and summarization for YouTube, Google Cloud, Google Photos and Nest, as well as mobile apps including Motion Stills, PhotoScan and Allo.

We actively contribute to the open source and research communities. Our pioneering deep learning advances, such as Inception and Batch Normalization, are available in TensorFlow. Further, we have released several large-scale datasets for machine learning, including: AudioSet (audio event detection); AVA (human action understanding in video); Open Images (image classification and object detection); and YouTube-8M (video labeling).

Some of Our Projects

Portrait mode, a new feature of the Pixel 2 smartphone, lets you capture shallow depth-of-field pictures of family, friends, and flowers. It runs on both the rear-facing and selfie cameras, even though neither is a dual camera. How is this possible? Lots of machine learning, and some other tricks.
The TensorFlow Object Detection API is an open-source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models, designed to support state-of-the-art models while allowing for rapid exploration and research. We also include a collection of detection models pre-trained on the COCO dataset.
Motion Stills, an app for iOS and Android, applies our video stabilization technology to your Live Photos, allowing you to freeze the background into a still photo or create sweeping cinematic pans. Further, you create even more beautiful videos and fun GIFs using motion-tracked text overlays, super-resolution videos, and automatic cinemagraphs.
Virtual Reality (VR) enables remarkably immersive experiences. However, sharing these experiences with others can be difficult, as VR headsets make it challenging to create a complete picture of the people participating in the experience. To address this problem we reveal the user’s face by virtually “removing” the headset, creating a realistic see-through effect.
Rapid and Accurate Image Super-Resolution (RAISR), incorporates machine learning to produce high-quality versions of low-resolution images, removing aliasing artifacts that may be present in the original image. RAISR produces results that are comparable to or better than the currently available super-resolution methods, and does so roughly 10 to 100 times faster, allowing it to run on a mobile device in real-time.
PhotoScan, an app for iOS and Android, allows you to digitize photo prints with just a smartphone. One of the key features of PhotoScan is the ability to remove glare from prints by using a unique blend of computer vision and image processing techniques.
We believe strongly in sharing our findings with the world, fostering an open culture that encourages researchers to disseminate their work among the broader science community.

Some of Our Team

 
“Google is unique in being one of the few places in the world where we have the computational resources and the data to make giant strides towards the dream of solving computer vision.”
 
“We interpret the world around us through our senses. Our team helps make Google products more helpful and delightful by giving our systems the ability to understand and reason about images, video and sound.”

Join the Team