Athena

Solving fundamental computational problems that deliver meaningful impact for Google’s products, society, and scientific progress.

Athena

About the team

Athena is an international team of research scientists and engineers who tackle product-inspired problems with novel solutions to assist, complement, empower, and inspire people — from the everyday to the imaginative. Our work spans algorithms, artificial intelligence (AI), language understanding, and many other fields, and yields state-of-the-art breakthroughs in areas like efficiency, privacy, and user engagement.

We collaborate closely with partners across Google to take discoveries from publication to implementation for the Company's largest and most trusted products. Beyond Google's portfolio of products and services, our contributions to AI, computer science and machine learning power scientific advances for climate science, journalism, microeconomics and other data-driven disciplines.

We recognize that AI is a foundational and transformational technology and are proud to contribute to a long history of responsible innovation. Our commitment to Responsible AI principles ensure we develop and use technologies in ways that are socially beneficial, avoid bias, are built and tested for safety, are accountable to people and aligned with our values

Team focus summaries

Graph-based learning

We extend machine learning approaches to better model the relationships contained in information networks. These models (e.g., semi-supervised similarity ranking & clustering, neural graph embedding, and graph convolutional approaches) are useful in a wide range of machine learning applications.

Learn more

Market algorithms

Auction theory, mechanism design, and advanced algorithms serve to improve Ads and other market-based products

Learn more

Operations research

Applying integer programming, linear programming, constraint programming, and graph algorithms to solve problems at scale for transportation, search, natural language understanding, computer vision, robotics and more.

Learn more

Language

We advance the state of the art in natural language technologies and build systems that learn to understand and generate language in context.

Learn more

Large-scale machine learning

We focus on large scale machine learning including supervised learning (e.g. deep learning and kernel-based learning), and semi/unsupervised learning (e.g. streaming clustering and efficient similarity search). The research areas include distributed optimization, personalization and privacy-preserving learning, on-device learning and inference, recommendation systems, data-dependent hashing, and learning-based vision. We develop principled approaches and apply them to Google’s products. Our team regularly publishes in top-tier learning conferences and journals. Our team’s work has been applied across Google, powering Search and Display Ads, YouTube, Android, Play, Gmail, Assistant and Google Shopping.

Online clustering

We provide fast clustering of the datasets that can scale to billions of datapoints, and a streaming throughput of hundreds of thousands of points per second. The goal is to provide scalable nonparametric clustering without making simplistic generative assumptions like convexity of clusters which are rarely true in practice. The team develops techniques that can handle drift in data distributions over time. These techniques are being used in a large number of applications including dynamic spam detection in multiple products and semantic expansion in NLP.

Modeling and data science

We sift through data to discover, understand, and model implicit signals in user behavior. We partner with Product Areas such as Ads, YouTube, Android, and more to add machine learning functionality to products across Google. Due to the open ended nature of data mining, ongoing projects vary and currently include smart notifications on Android, Ads Pricing optimizations, differential privacy work, and more.

Structured data

The goals of the Structured Data group are: 1) working with various product teams closely and leverage our expertise in structured data to solve challenging technical problems and initiate new product features; 2) providing scientific expertise in computational journalism across Google in the fight against digital misinformation; 3) drive a long-term agenda that will advance state-of-the-art research in structured data with real world impact.

Scalable matching

We develop techniques for large scale similarity search in massive databases with arbitrary data types (sparse or dense high dimensional data) and similarity measures (metric/non-metric, potentially learned from data). The focus has been on developing data-dependent ML-based hashing techniques and tree-hash hybrids that are driving a multitude of applications at Google. This team also develops techniques for fast inference in machine learning models including neural networks, often improving the speeds over 50x while maintaining near exact accuracy.

Speech and language algorithms

Our mission is to accurately and efficiently represent, combine, optimize and search models of speech and text. In particular, we devise automata, grammars, neural and other models that represent word histories, context-dependent lexicons for speech and keyboard, written-to-spoken transductions and extractions of dates, times, currency, measures, etc, and transliteration and contextual models of language. These can be combined and optimized to give high-accuracy, efficient speech recognition and synthesis, text normalization, and more. We provide efficient decoding algorithms to search these models. This work is used extensively in Google's speech and text processing infrastructure.

Sensitive content detection

Our mission is to create a comprehensive set of classifiers for detecting offensive, inappropriate & controversial content in images and video. We accomplish this using a variety of techniques, including ensembles of ML models that are trained on images and text from the web. We also apply transfer learning on deep vision models for domain-specific classifier creation.

Semi-supervised and unsupervised machine learning

Semi-supervised learning is increasingly critical to solving many real-world product problems where data is sparse, sparsely labeled, or noisy. We develop semi-supervised and unsupervised machine learning systems that operate at Google scale. We apply our research to a broad range of problems, including query understanding, conversation understanding, and media understanding.

ML model compression for mobile devices

We develop systems for transforming cloud-resident ML models to highly efficient models that run on resource-constrained mobile devices.

Media understanding in conversations

We enrich electronic conversations by understanding media using multi-modal signals from images, video, text, and the web. We accomplish this by marrying machine vision models with ML-enabled natural language understanding and generation systems.

Combinatorial machine learning

Many fundamental learning problems we solve at Google have non-trivial combinatorial structure that prevents the application of general purpose ML algorithms. They exhibit complex and discontinuous loss functions (e.g., in pricing) or combinatorial explosions (such as contextual bandits, feature selection, or integer programming) and may require solutions that are robust against strategic behavior. Our team pushes the boundaries in these areas through research that blends techniques from learning theory, game theory, and discrete/continuous optimization.

Glassbox

Glassbox Learning does R&D into making ML more controllable and interpretable, without sacrificing accuracy. An important line of research is how to translate policy goals about metrics and fairness into machine learning training. For interpretability, Glassbox provides end-to-end guarantees on the relationship of inputs to outputs, such as monotonicity and other shape constraints. To achieve these goals, Glassbox researches and utilizes new algorithms for constrained optimization.

Dataset search

Dataset Search, also known as Science Search, is a project to index all datasets on the web and to make the metadata (and, where possible, the data itself) searchable and useful. Datasets and related data tend to be spread across multiple data repositories on the web. In most cases, data is not linked nor has it been indexed which makes searching tedious or, in some cases, impossible.