Google AI New York

About our work

Google AI New York creates useful solutions to fundamental computational problems in theory and algorithms, machine learning, journalism, speech, and other data-driven disciplines, with impact on Google’s products and scientific progress.

To achieve this double objective (impact on Google and scientific impact), we focus on two tools: software libraries to vehicle the research findings to product and services, and publications to make the work known to the community. Our teams and focus areas are as follows:

  • Supervised Machine Learning - The Supervised Machine Learning team has a long history of building and applying ML techniques at Google, having previously developed a Core Google API for supervised machine learning and more recently researching and developing tools for the TensorFlow ecosystem (e.g. tf.Transform, kernel methods, gradient boosted trees). We also actively collaborate with product groups across Google (e.g. Docs, Search, Ads, Geo) to help deploy ML-based solutions and actively publish cutting edge research (e.g. automatic hyperparameter tuning, compact multi-class gradient boosted trees).
  • Algorithms and Optimization - The Algorithms and Optimization team comprises multiple overlapping research groups working on graph mining, large-scale optimization, and market algorithms. We collaborate closely with teams across Google, benefiting Ads, Search, YouTube, Play, Infrastructure, Geo, Social, Image Search, Cloud and more. Along with these collaborations, we perform research related to algorithmic foundations of machine learning, distributed optimization, economics, data mining, and data-driven optimization. Our researchers are involved in both long-term research efforts as well as immediate applications of our technology. More on our work can be found here.
  • Large Scale Machine Learning - We focus on large scale machine learning including supervised learning (e.g. deep learning and kernel-based learning), and semi/unsupervised learning (e.g. streaming clustering and efficient similarity search). The research areas include distributed optimization, personalization and privacy-preserving learning, on-device learning and inference, recommendation systems, data-dependent hashing, and learning-based vision. We develop principled approaches and apply them to Google’s products. Our team regularly publishes in top-tier learning conferences and journals. Our team’s work has been applied across Google, powering Search and Display Ads, YouTube, Android, Play, Gmail, Assistant and Google Shopping.
  • Online Clustering - The Online Clustering team provides fast clustering of the datasets that can scale to billions of datapoints, and a streaming throughput of hundreds of thousands of points per second. The goal is to provide scalable nonparametric clustering without making simplistic generative assumptions like convexity of clusters which are rarely true in practice. The team develops techniques that can handle drift in data distributions over time. These techniques are being used in a large number of applications including dynamic spam detection in multiple products and semantic expansion in NLP.
  • Coauthor - The Coauthor team’s objective is cross-lingual cross-modal access to dynamically organized information. We hope to make reading, writing or watching an immersive experience by surfacing relevant information from the web, possibly synthesized dynamically from across different sources or types of content such as text, images, charts, and videos. Coauthor powers the web content suggestions in Google Docs when users are writing a document. The team is actively working on other applications.
  • Graph Mining - Our mission is to build the most scalable library for graph algorithms and analysis and apply it to a multitude of Google products. We formalize data mining and machine learning challenges as graph problems and perform fundamental research in those fields leading to publications in top venues. Our algorithms and systems are used in a wide array of Google products such as Search, YouTube, AdWords, Play, Maps, and Social. More on our work can be found here.
  • Large Scale Optimization - Our mission is to develop large-scale optimization techniques and use them to improve the efficiency and robustness of infrastructure at Google. We apply techniques from areas such as combinatorial optimization, online algorithms, and control theory to make Google’s big computational infrastructure do more with less. We combine online and offline optimizations to achieve such goals as increasing throughput, decreasing latency, minimizing resource contention, maximizing the efficacy of caches, and eliminating unnecessary work in distributed systems. Our research is used in critical infrastructure that supports products such as Search and Cloud. More on our work can be found here.
  • Modeling and Data Science - The Modeling and Data Science team sifts through data to discover, understand, and model implicit signals in user behavior. We partner with Product Areas such as Ads, YouTube, Android, and more to add machine learning functionality to products across Google. Due to the open ended nature of data mining, ongoing projects vary and currently include smart notifications on Android, Ads Pricing optimizations, differential privacy work, and more.
  • Market Algorithms - Our mission is to analyze, design, and deliver economically and computationally efficient marketplaces across Google. Our research serves to optimize display ads for Doubleclick’s reservation and exchange as well as sponsored search and mobile ads. More on our work here.
  • Structured Data - Structured Data plays an essential role in Google's products and features including Fact Check in Google News and Search, Knowledge Panel, Structured Snippets, Search Q&A, etc. The goals of the Structured Data group are: 1) working with various product teams closely and leverage our expertise in structured data to solve challenging technical problems and initiate new product features; 2) providing scientific expertise in computational journalism across Google in the fight against digital misinformation; 3) drive a long-term agenda that will advance state-of-the-art research in structured data with real world impact. We use a wide range of techniques including machine learning, data mining, NLP, information retrieval and extraction.
  • Scalable Matching - The Scalable Matching team develops techniques for large scale similarity search in massive databases with arbitrary data types (sparse or dense high dimensional data) and similarity measures (metric/non-metric, potentially learned from data). The focus has been on developing data-dependent ML-based hashing techniques and tree-hash hybrids that are driving a multitude of applications at Google. This team also develops techniques for fast inference in machine learning models including neural networks, often improving the speeds over 50x while maintaining near exact accuracy.
  • Speech and Language Algorithms - Our team's focus is to accurately and efficiently represent, combine, optimize and search models of speech and text. In particular, we devise automata, grammars, neural and other models that represent word histories, context-dependent lexicons for speech and keyboard, written-to-spoken transductions and extractions of dates, times, currency, measures, etc, and transliteration and contextual models of language. These can be combined and optimized to give high-accuracy, efficient speech recognition and synthesis, text normalization, and more. We provide efficient decoding algorithms to search these models. This work is used extensively in Google's speech and text processing infrastructure.

Some of Our Team Members

Head, Research NY
“Google provides the most inspiring research environment in the world: learning problems with massively large data sets, truly motivating applications with great diversity, real impact of research on the design of products and solutions, and the best researchers and engineers.”

Join the Team