Machine Perception

Research in machine perception tackles the hard problems of understanding images, sounds, music and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as: content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.

Recent Publications

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Marc Pollefeys
Delitzas Alexandros
Ayça Takmaz
Francis Engelmann
CVPR (2024) (to appear)
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
Sungjoo Yoo
Han-Byul Kim
Joo Hyung Lee
Hong-Seok Kim
Proc. The 38th Annual AAAI Conference on Artificial Intelligence (AAAI) (2024)
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Fabian Manhardt
Christina Tsalicoglou
Michael Niemeyer
3DV 2024 (2024)
LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals
Guilherme Perrotta
Arjun Karpur
Ricardo Martin-Brualla
Proc. 3DV'24 (2024) (to appear)