About our work
Google AI New York creates useful solutions to fundamental computational problems
in theory and algorithms, machine learning, journalism, speech, and other
data-driven disciplines, with impact on Google’s products and scientific progress.
To achieve this double objective (impact on Google and scientific impact), we focus
on two tools: software libraries to vehicle the research findings to product and
services, and publications to make the work known to the community. Our teams and
focus areas are as follows:
-
Supervised Machine Learning - The Supervised Machine Learning team has a
long history of building and applying ML techniques at Google, having previously
developed a Core Google API for supervised machine learning and more recently
researching and developing tools for the TensorFlow ecosystem (e.g. tf.Transform, kernel
methods, gradient boosted trees). We also actively collaborate with
product groups across Google (e.g. Docs, Search, Ads, Geo) to help deploy
ML-based solutions and actively publish cutting edge research (e.g. automatic
hyperparameter tuning, compact multi-class gradient boosted trees).
-
Algorithms and Optimization - The Algorithms and Optimization team
comprises multiple overlapping research groups working on graph mining,
large-scale optimization, and market algorithms. We collaborate closely with
teams across Google, benefiting Ads, Search, YouTube, Play, Infrastructure, Geo,
Social, Image Search, Cloud and more. Along with these collaborations, we perform
research related to algorithmic foundations of machine learning, distributed
optimization, economics, data mining, and data-driven optimization. Our
researchers are involved in both long-term research efforts as well as immediate
applications of our technology. More on our work can be found here.
-
Large Scale Machine Learning - We focus on large scale machine learning
including supervised learning (e.g. deep learning and kernel-based learning), and
semi/unsupervised learning (e.g. streaming clustering and efficient similarity
search). The research areas include distributed optimization, personalization and
privacy-preserving learning, on-device learning and inference, recommendation
systems, data-dependent hashing, and learning-based vision. We develop principled
approaches and apply them to Google’s products. Our team regularly publishes in
top-tier learning conferences and journals. Our team’s work has been applied
across Google, powering Search and Display Ads, YouTube, Android, Play, Gmail,
Assistant and Google Shopping.
-
Online Clustering - The Online Clustering team provides fast clustering of
the datasets that can scale to billions of datapoints, and a streaming throughput
of hundreds of thousands of points per second. The goal is to provide scalable
nonparametric clustering without making simplistic generative assumptions like
convexity of clusters which are rarely true in practice. The team develops
techniques that can handle drift in data distributions over time. These
techniques are being used in a large number of applications including dynamic
spam detection in multiple products and semantic expansion in NLP.
-
Coauthor - The Coauthor team’s objective is cross-lingual cross-modal
access to dynamically organized information. We hope to make reading, writing or
watching an immersive experience by surfacing relevant information from the web,
possibly synthesized dynamically from across different sources or types of
content such as text, images, charts, and videos. Coauthor powers the web content
suggestions in Google Docs when users are writing a document. The team is
actively working on other applications.
-
Graph Mining - Our mission is to build the most scalable library for graph
algorithms and analysis and apply it to a multitude of Google products. We
formalize data mining and machine learning challenges as graph problems and
perform fundamental research in those fields leading to publications in top
venues. Our algorithms and systems are used in a wide array of Google products
such as Search, YouTube, AdWords, Play, Maps, and Social. More on our work can be
found here.
-
Large Scale Optimization - Our mission is to develop large-scale
optimization techniques and use them to improve the efficiency and robustness of
infrastructure at Google. We apply techniques from areas such as combinatorial
optimization, online algorithms, and control theory to make Google’s big
computational infrastructure do more with less. We combine online and offline
optimizations to achieve such goals as increasing throughput, decreasing latency,
minimizing resource contention, maximizing the efficacy of caches, and
eliminating unnecessary work in distributed systems. Our research is used in
critical infrastructure that supports products such as Search and Cloud. More on
our work can be found here.
-
Modeling and Data Science - The Modeling and Data Science team sifts
through data to discover, understand, and model implicit signals in user
behavior. We partner with Product Areas such as Ads, YouTube, Android, and more
to add machine learning functionality to products across Google. Due to the open
ended nature of data mining, ongoing projects vary and currently include smart
notifications on Android, Ads Pricing optimizations, differential privacy work,
and more.
-
Market Algorithms - Our mission is to analyze, design, and deliver
economically and computationally efficient marketplaces across Google. Our
research serves to optimize display ads for Doubleclick’s reservation and
exchange as well as sponsored search and mobile ads. More on our work here.
-
Structured Data - Structured Data plays an essential role in Google's
products and features including Fact Check in Google News and Search, Knowledge
Panel, Structured Snippets, Search Q&A, etc. The goals of the Structured Data
group are: 1) working with various product teams closely and leverage our
expertise in structured data to solve challenging technical problems and initiate
new product features; 2) providing scientific expertise in computational
journalism across Google in the fight against digital misinformation; 3) drive a
long-term agenda that will advance state-of-the-art research in structured data
with real world impact. We use a wide range of techniques including machine
learning, data mining, NLP, information retrieval and extraction.
-
Scalable Matching - The Scalable Matching team develops techniques for
large scale similarity search in massive databases with arbitrary data types
(sparse or dense high dimensional data) and similarity measures
(metric/non-metric, potentially learned from data). The focus has been on
developing data-dependent ML-based hashing techniques and tree-hash hybrids that
are driving a multitude of applications at Google. This team also develops
techniques for fast inference in machine learning models including neural
networks, often improving the speeds over 50x while maintaining near exact
accuracy.
-
Speech and Language Algorithms - Our team's focus is to accurately and
efficiently represent, combine, optimize and search models of speech and text. In
particular, we devise automata, grammars, neural and other models that represent
word histories, context-dependent lexicons for speech and keyboard,
written-to-spoken transductions and extractions of dates, times, currency,
measures, etc, and transliteration and contextual models of language. These can
be combined and optimized to give high-accuracy, efficient speech recognition and
synthesis, text normalization, and more. We provide efficient decoding algorithms
to search these models. This work is used extensively in Google's speech and text
processing infrastructure.