D. Sculley
I'm currently interested in massive scale machine learning problems for online advertising. My work includes both novel research and applied engineering.
For more details, see my home page.
Authored Publications
Google Publications
Other Publications
Sort By
Adversarial Nibbler: A DataPerf Challenge for Text-to-Image Models
Hannah Kirk
Jessica Quaye
Charvi Rastogi
Max Bartolo
Oana Inel
Meg Risdal
Will Cukierski
Vijay Reddy
Online (2023)
Preview abstract
Machine learning progress has been strongly influenced by the data used for
model training and evaluation. Only recently however, have development teams
shifted their focus more to the data. This shift has been triggered by the numerous
reports about biases and errors discovered in AI datasets. Thus, the data-centric
AI movement introduced the notion of iterating on the data used in AI systems, as
opposed to the traditional model-centric AI approach, which typically treats the
data as a given static artifact in model development. With the recent advancement of
generative AI, the role of data is even more crucial for successfully developing more
factual and safe models. DataPerf challenges follow up on recent successful data-
centric challenges drawing attention to the data used for training and evaluation of
machine learning model. Specifically, Adversarial Nibbler focuses on data used for
safety evaluation of generative text-to-image models. A typical bottleneck in safety
evaluation is achieving a representative diversity and coverage of different types
of examples in the evaluation set. Our competition aims to gather a wide range
of long-tail and unexpected failure modes for text-to-image models in order to
identify as many new problems as possible and use various automated approaches
to expand the dataset to be useful for training, fine tuning, and evaluation.
View details
Plex: Towards Reliability using Pretrained Large Model Extensions
Du Phan
Mark Patrick Collier
Zi Wang
Zelda Mariet
Clara Huiyi Hu
Neil Band
Tim G. J. Rudner
Joost van Amersfoort
Andreas Christian Kirsch
Rodolphe Jenatton
Honglin Yuan
Kelly Buchanan
Yarin Gal
ICML 2022 Pre-training Workshop (2022)
Preview abstract
A recent trend in artificial intelligence (AI) is the use of pretrained models for language and vision tasks, which has achieved extraordinary performance but also puzzling failures. Examining tasks that probe the model’s abilities in diverse ways is therefore critical to the field. In this paper, we explore the \emph{reliability} of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks such as uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot learning). We devise 11 types of tasks over 36 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, \emph{p}retrained \emph{l}arge-model \emph{ex}tensions (henceforth abbreviated as \emph{plex}) for vision and language modalities. Plex greatly improves the state-of-the-art across tasks, and as a pretrained model Plex unifies the traditional protocol of designing and tuning one model for each reliability task. We demonstrate scaling effects over model sizes and pretraining dataset sizes up to 4 billion examples. We also demonstrate Plex’s capabilities on new tasks including zero-shot open set recognition, few-shot uncertainty, and uncertainty in conversational language understanding.
View details
Chapter 1B "Data Management Principles" _Reliable Machine Learning: Applying SRE Principles to ML in Production_
Cathy Chen
Kranti Parisa
Niall Richard Murphy
Todd Underwood
Reliable Machine Learning: Applying SRE Principles to ML in Production, O'Reilly (2022)
Preview abstract
Machine learning is rapidly becoming a vital tool for many organizations today. It’s used to increase revenue, optimise decision making, understand customer behaviour (and influence it), and solve problems across a very wide set of domains, in some cases at performance levels significantly superior to human ones. Machine learning touches billions of people multiple times a day.
Yet, industry-wide, the state of how organizations implement ML is, frankly, very poor. There isn’t even a framework describing how best to do it - most people are just making it up as they go along. There are many consequences to this, including poorer quality outcomes for both user and organization, lost revenue opportunities, legal exposures, and so on. Even worse is the fact that data, key to the success of ML, has become both a vitally important asset and a critical liability: organizations have not internalized how to manage it.
For all these reasons and more, the industry needs a framework -- a way of understanding the issues around running actual, reliable, production-quality ML systems, and a collection of the actual practical and conceptual approaches to “reliable ML for everyone”. That makes it natural to reach for the conceptual framework provided by the Site Reliability Engineering discipline to provide that understanding. Bringing SRE approaches to running production systems helps them to be reliable, to scale well, to be well monitored, managed, and useful for customers; analogously, SRE approaches (including the Dickerson hierarchy, SLO & SLIs, effective data handling, and so on) for machine learning help to accomplish the same ends.
Yet SRE approaches are not the totality of the story. We provide guidance for model developers, data scientists, and business owners to perform the nuts and bolts of their day to day jobs, while also keeping the bigger picture in mind. In other words, this book applies an SRE mindset to machine learning, and shows how to run an effective, efficient, and reliable ML system, whether you are a small startup or a planet-spanning megacorp. It will describe what to do whether you are starting from a completely blank slate, or have significant scale already. It will describe operational approaches, data-centric ways of thinking about production systems, and ethical guidelines - increasing important in today’s world.
View details
Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology
Babak Alipanahi
Babak Behsaz
Zachary Ryan Mccaw
Emanuel Schorsch
Lizzie Dorfman
Sonia Phene
Andrew Walker Carroll
Anthony Khawaja
American Journal of Human Genetics (2021)
Preview abstract
Genome-wide association studies (GWAS) require accurate cohort phenotyping, but expert labeling can be costly, time-intensive, and variable. Here we develop a machine learning (ML) model to predict glaucomatous features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; P≤5×10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR, with select loci near genes involved in neuronal and synaptic biology or known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR in independent datasets.
View details
A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting Using Neural Networks
Lisie Lillianfeld
Katie Seaver
Jordan R. Green
InterSpeech 2021 (2021)
Preview abstract
Severe speech impairments limit the precision and range of producible speech sounds. As a result, generic automatic speech recognition (ASR) and keyword spotting (KWS) systems are unable to accurately recognize the utterances produced by individuals with severe speech impairments. This paper describes an approach in which simple speech sounds, namely isolated open vowels (e.g., /a/), are used in lieu of more motorically-demanding keywords. A neural network (NN) is trained to detect these isolated open vowels uttered by individuals with speech impairments against background noise. The NN is trained with a two-phase approach. The pre-training phase uses samples from unimpaired speakers along with samples of background noises and unrelated speech; then the fine-tuning stage uses samples of vowel samples collected from individuals with speech impairments. This model can be built into an experimental mobile app that allows users to activate preconfigured actions such as alerting caregivers. Preliminary user testing indicates the model has the potential to be a useful and flexible emergency communication channel for motor- and speech-impaired individuals.
View details
Population Based Optimization for Biological Sequence Design
Zelda Mariet
David Martin Dohan
ICML 2020 (2020)
Preview abstract
The use of black-box optimization for the design of new biological sequences is an emerging research area with potentially revolutionary impact. The cost and latency of wet-lab experiments requires methods that find good sequences in few experimental rounds of large batches of sequences --- a setting that off-the-shelf black-box optimization methods are ill-equipped to handle. We find that the performance of existing methods varies drastically across optimization tasks, posing a significant obstacle to real-world applications. To improve robustness, we propose population-based optimization (PBO), which generates batches of sequences by sampling from an ensemble of methods. The number of sequences sampled from any method is proportional to the quality of sequences it previously proposed, allowing PBO to combine the strengths of individual methods while hedging against their innate brittleness. Adapting the population of methods online using evolutionary optimization further improves performance. Through extensive experiments on in-silico optimization tasks, we show that PBO outperforms any single method in its population, proposing both higher quality single sequences as well as more diverse batches. By its robustness and ability to design diverse, high-quality sequences, PBO is shown to be a new state-of-the art approach to the batched black-box optimization of biological sequences.
View details
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Dan Moldovan
Babak Alipanahi
Alex Beutel
Christina Chen
Jon Deaton
Shaobo Hou
Ghassen Jerfel
Yian Ma
Akinori Mitani
Andrea Montanari
Christopher Nielsen
Thomas Osborne
Rajiv Raman
Kim Ramasamy
Martin Gamunu Seneviratne
Shannon Sequeira
Harini Suresh
Victor Veitch
Journal of Machine Learning Research (2020)
Preview abstract
ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.
View details
Preview abstract
Traditionally there has been a perceived trade-off between machine learning code that is easy to write and machine learning code that fast, scalable, or easy to distribute, with platforms like TensorFlow, Theano, PyTorch, and Autograd inhabiting different points along this tradeoff curve. PyTorch and Autograd offer the coding benefits of imperative programming style and accept the computational tradeoff of interpretive overhead. TensorFlow and Theano give the benefit of whole-program optimization based on defined computation graphs, with the trade-off of potentially cumbersome graph-based semantics and associated developer overhead, which become especially apparent for more complex model types that depend on control flow operators. We propose to capture the benefits of both paradigms, using imperative programming style while enabling high performance program optimization, by using staged programming via source code transformation to essentially compile native Python into a lower-level IR like TensorFlow graphs. A key insight is to delay all type-dependent decisions until runtime, via dynamic dispatch. We instantiate these principles in AutoGraph, a piece of software that improves the programming experience of the TensorFlow machine learning library, and demonstrate the strong usability improvements with no loss in performance compared to native TensorFlow graphs.\end{abstract}
View details
TensorFlow.js: Machine Learning for the Web and Beyond
Daniel Smilkov
Nikhil Thorat
Yannick Assogba
Ann Yuan
Nick Kreeger
Ping Yu
Kangyi Zhang
Eric Nielsen
Stan Bileschi
Charles Nicholson
Sandeep N. Gupta
Sarah Sirajuddin
Rajat Monga
SysML, Palo Alto, CA, USA (2019)
Preview abstract
TensorFlow.js is a library for building and executing machine learning algorithms in JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is part of the TensorFlow ecosystem, providing a set of APIs that are compatible with those in Python, allowing models to be ported between the Python and JavaScript ecosystems. TensorFlow.js has empowered a new set of developers from the extensive JavaScript community to build and deploy machine learning models and enabled new classes of on-device computation. This paper describes the design, API, and implementation of TensorFlow.js, and highlights some of the impactful use cases.
View details
Preview abstract
When confronted with a substance of unknown identity, researchers often perform mass spectrometry on the sample and compare the observed spectrum to a library of previously collected spectra to identify the molecule. While popular, this approach will fail to identify molecules that are not in the existing library. In response, we propose to improve the library’s coverage by augmenting it with synthetic spectra that are predicted from candidate molecules using machine learning. We contribute a lightweight neural network model that quickly predicts mass spectra for small molecules, averaging 5 ms per molecule with a recall-at-10 accuracy of 91.8%. Achieving high-accuracy predictions requires a novel neural network architecture that is designed to capture typical fragmentation patterns from electron ionization. We analyze the effects of our modeling innovations on library matching performance and compare our models to prior machine-learning-based work on spectrum prediction.
View details
Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
Yaniv Ovadia
Sebastian Nowozin
Josh Dillon
Advances in Neural Information Processing Systems (2019)
Preview abstract
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {\em uncertainty}. Quantifying uncertainty is especially critical in real-world settings, which often involve distributions that are skewed from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a model's output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous large-scale empirical comparison of these methods under conditions of distributional skew. We present a large-scale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of distributional skew on accuracy and calibration. We find that traditional post-hoc calibration falls short and some Bayesian methods are intractable for very large data. However, methods that marginalize over models give surprisingly strong results across a broad spectrum.
View details
Deep Learning Classifies the Protein Universe
Theo Sanderson
Brandon Carter
Mark DePristo
Nature Biotechnology (2019)
Preview abstract
Understanding the relationship between amino acid sequence and protein function is a long-standing problem in molecular biology with far-reaching scientific implications. Despite six decades of progress, state-of-the-art techniques cannot annotate $\sim1/3$ of microbial protein sequences, hampering our ability to exploit sequences collected from diverse organisms. To address this, we report a deep learning model that learns the relationship between unaligned amino acid sequences and their functional classification across all 17929 families of the PFam database. Using the Pfam seed sequences we establish a rigorous benchmark assessment and find that a dilated convolutional model reduces the error of state of the art BLASTp and pHMM models by a factor of nine. With 80\% of the full Pfam database we train a protein family predictor that is more accurate and over 200 times faster than BLASTp, while learning sequence features such as structural disorder and transmembrane helices. Our model co-locates sequences from unseen families in embedding space far from existing families, allowing sequences from novel families to be classified. We anticipate that deep learning models will be a core component of future general-purpose protein function prediction tools.
View details
The Inclusive Images Competition
Igor Ivanov
Miha Skalic
Pallavi Baljekar
Pavel Ostyakov
Roman Solovyev
Weimin Wang
Yoni Halpern
Springer Series (2019)
Preview abstract
Popular large image classification datasets that are drawn from the web present Eurocentric and Americentric biases that negatively impact the generalizability of models trained on them . In order to encourage the development of modeling approaches that generalize well to images drawn from locations and cultural contexts that are unseen or poorly represented at the time of training, we organized the Inclusive Images competition in association with Kaggle and the NeurIPS 2018 Competition Track Workshop. In this chapter, we describe the motivation and design of the competition, present reports from the top three competitors, and provide high-level takeaways from the competition results.
View details
BriarPatches: Pixel-Space Interventions for Inducing Demographic Parity
Yoni Halpern
Neural Information Processing Systems: Workshop on Ethical, Social and Governance Issues in AI (2018)
Preview abstract
We introduce the BriarPatch, a pixel-space intervention that obscures sensitive attributes from representations encoded in pre-trained classifiers. The patches encourage internal model representations not to encode sensitive information, which has the effect of pushing downstream predictors towards exhibiting demographic parity with respect to the sensitive information. The net result is that these BriarPatches provide an intervention mechanism available at user level, and complements prior research on fair representations that were previously only applicable by model developers and ML experts.
View details
No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World
Shreya Shankar
Yoni Halpern
NIPS 2017 workshop: Machine Learning for the Developing World
Preview abstract
Modern machine learning systems such as image classifers rely heavily on
large scale data sets for training. Such data sets are costly to create,
thus in practice a small number of freely available, open source data sets
are widely used. Such strategies may be particularly important for ML
applications in the developing world, where resources may be constrained
and the cost of creating suitable large scale data sets may be a
blocking factor. However, we suggest that examining the {\em geo-diversity}
of open data sets is critical before adopting a data set for such use
cases. In particular, we analyze two large, publicly available image
data sets to assess geo-diversity and find that these data sets appear
to exhibit a observable amerocentric and eurocentric representation bias.
Further, we perform targeted analysis on classifiers that use these data
sets as training data to assess the impact of these training distributions,
and find strong differences in the relative performance on images from
different locales. These results emphasize the need to ensure
geo-representation when constructing data sets for use in the developing
world.
View details
Preview abstract
Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt.
View details
Preview abstract
Data cleaning and feature engineering are both common practices when developing
machine learning (ML) models. However, developers are not always aware of best
practices for preparing or transforming data for a given model type, which can lead
to suboptimal representations of input features. To address this issue, we introduce
the data linter, a new class of ML tool that automatically inspects ML data sets
to 1) identify potential issues in the data and 2) suggest potentially useful feature
transforms, for a given model type. As with traditional code linting, data linting
automatically identifies potential issues or inefficiencies; codifies best practices and
educates end-users about these practices through tool use; and can lead to quality
improvements. In this paper, we provide a detailed description of data linting,
describe our initial implementation of a data linter for deep neural networks, and
report results suggesting the utility of using a data linter during ML model design.
View details
TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks
Cassandra Xia
Clemens Mewald
George Roumpos
Illia Polosukhin
Jamie Alexander Smith
Jianwei Xie
Lichan Hong
Mustafa Ispir
Philip Daniel Tucker
Yuan Tang
Proceedings of the 23th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada (2017)
Preview abstract
We present a framework for specifying, training, evaluating, and deploying machine learning models. Our focus is to simplify writing cutting edge machine learning models in a way that enables bringing those models into production. Recognizing the fast evolution of the field of deep learning, we make no attempt to capture the design space of all possible model architectures in a DSL or similar configuration. We allow users to write code to define their models, but provide abstractions that guide developers to write models in ways conducive to productionization, as well as providing a unifying Estimator interface, a unified interface making it possible to write downstream infrastructure (distributed training, hyperparameter tuning, …) independent of the model implementation.
We balance the competing demands for flexibility and simplicity by offering APIs at different levels of abstraction, making common model architectures available “out of the box”, while providing a library of utilities designed to speed up experimentation with model architectures. To make out of the box models flexible and usable across a wide range of problems, these canned Estimators are parameterized not only over traditional hyperparameters, but also using feature columns, a declarative specification describing how to interpret input data.
We discuss our experience in using this framework in research and production environments, and show the impact on code health, maintainability, and development speed.
View details
Learning to count mosquitoes for the Sterile Insect Technique
Yaniv Ovadia
Yoni Halpern
Dilip Krishnan
Daniel Newburger
Proceedings of the 23nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017)
Preview abstract
Mosquito-borne illnesses such as dengue, chikungunya, and Zika
are major global health problems, which are not yet addressable
with vaccines and must be countered by reducing mosquito popula-
tions. The Sterile Insect Technique (SIT) is a promising alternative
to pesticides; however, effective SIT relies on minimal releases of
female insects. This paper describes a multi-objective convolutional
neural net to significantly streamline the process of counting male
and female mosquitoes released from a SIT factory and provides a
statistical basis for verifying strict contamination rate limits from
these counts despite measurement noise. These results are a promis-
ing indication that such methods may dramatically reduce the cost
of effective SIT methods in practice.
View details
Google Vizier: A Service for Black-Box Optimization
Subhodeep Moitra
ACM (2017), pp. 1487-1495
Preview abstract
Any sufficiently complex system acts as a black box when it becomes easier to
experiment with than to understand. Hence, black-box optimization has become
increasingly important as systems have become more complex. In this paper we
describe Google Vizier, a Google-internal service for performing
black-box optimization that has become the de facto parameter tuning
engine at Google. Google Vizier is used to optimize many of our machine
learning models and other systems, and also provides core capabilities to
Google's Cloud Machine Learning HyperTune subsystem. We discuss our
requirements, infrastructure design, underlying algorithms, and advanced
features such as transfer learning and automated early stopping that the
service provides.
View details
Bayesian Optimization for a Better Dessert
Subhodeep Moitra
Proceedings of the 2017 NIPS Workshop on Bayesian Optimization, December 9, 2017, Long Beach, USA (to appear)
Preview abstract
We present a case study on applying Bayesian Optimization to a complex real-world system; our challenge was to optimize chocolate chip cookies. The process was a mixed-initiative system where both human chefs, human raters, and a machine optimizer participated in 144 experiments. This process resulted in highly rated cookies that deviated from expectations in some surprising ways -- much less sugar in California, and cayenne in Pittsburgh. Our experience highlights the importance of incorporating domain expertise and the value of transfer learning approaches.
View details
TensorFlow Debugger: Debugging Dataflow Graphs for Machine Learning
Eric Nielsen
Michael Salib
Proceedings of the Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016)
Preview abstract
Debuggability is important in the development of machine-learning (ML) systems.
Several widely-used ML libraries, such as TensorFlow and Theano, are based on
dataflow graphs. While offering important benefits such as facilitating distributed
training, the dataflow graph paradigm makes the debugging of model issues more
challenging compared to debugging in the more conventional procedural paradigm.
In this paper, we present the design of the TensorFlow Debugger (tfdbg), a specialized
debugger for ML models written in TensorFlow. tfdbg provides features
to inspect runtime dataflow graphs and the state of the intermediate graph elements
("tensors"), as well as simulating stepping on the graph. We will discuss the
application of this debugger in development and testing use cases.
View details
What’s your ML test score? A rubric for ML production systems
Eric Nielsen
Michael Salib
Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016)
Preview abstract
Using machine learning in real-world production systems is complicated by a
host of issues not found in small toy examples or even large offline research
experiments. Testing and monitoring are key considerations for assessing the
production-readiness of an ML system. But how much testing and monitoring is
enough? We present an ML Test Score rubric based on a set of actionable tests to
help quantify these issues.
View details
AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech
Yannis Agiomyrgiannakis
NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop (to appear)
Preview abstract
Developers of text-to-speech synthesizers (TTS) often make use of
human raters to assess the quality of synthesized speech. We
demonstrate that we can model human raters' mean opinion scores
(MOS) of synthesized speech using a deep recurrent neural network
whose inputs consist solely of a raw waveform. Our best models
provide utterance-level estimates of MOS only moderately inferior to
sampled human ratings, as shown by Pearson and Spearman
correlations. When multiple utterances are scored and averaged,
a scenario common in synthesizer quality assessment,
we achieve correlations comparable to those of human raters.
This model has a number of applications, such as the
ability to automatically explore the parameter space of a speech
synthesizer without requiring a human-in-the-loop.
We explore a method of probing what the models have learned.
View details
Machine Learning: The High Interest Credit Card of Technical Debt
Eugene Davydov
Dietmar Ebner
Vinay Chaudhary
Michael Young
SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop)
Preview abstract
Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored
where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.
View details
Ad Click Prediction: a View from the Trenches
Michael Young
Dietmar Ebner
Julian Grady
Lan Nie
Eugene Davydov
Sharat Chikkerur
Dan Liu
Arnar Mar Hrafnkelsson
Tom Boulos
Jeremy Kubica
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2013)
Preview abstract
Predicting ad click--through rates (CTR) is a massive-scale learning
problem that is central to the multi-billion dollar online
advertising industry. We present a selection of case studies and
topics drawn from recent experiments in the setting of a deployed
CTR prediction system. These include improvements in the context of
traditional supervised learning based on an FTRL-Proximal online
learning algorithm (which has excellent sparsity and convergence
properties) and the use of per-coordinate learning rates.
We also explore some of the challenges that arise in a real-world
system that may appear at first to be outside the domain of
traditional machine learning research. These include useful tricks
for memory savings, methods for assessing and visualizing
performance, practical methods for providing confidence estimates
for predicted probabilities, calibration methods, and methods for
automated management of features. Finally, we also detail several
directions that did not turn out to be beneficial for us, despite
promising results elsewhere in the literature. The goal of this
paper is to highlight the close relationship between theoretical
advances and practical engineering in this industrial setting, and
to show the depth of challenges that appear when applying
traditional machine learning methods in a complex dynamic system.
View details
Large-Scale Learning with Less RAM via Randomization
Michael Young
Proceedings of the 30 International Conference on Machine Learning (ICML) (2013), pp. 10
Preview abstract
We reduce the memory footprint of popular large-scale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding. Compared to standard 32-bit float encodings, this
reduces RAM usage by more than 50% during training and by up to 95% when making predictions from a fixed model, with almost no loss in accuracy. We also show that randomized counting can be used to implement per-coordinate learning rates, improving model quality with little additional RAM. We prove these memory-saving methods achieve regret guarantees similar to their exact variants. Empirical evaluation confirms excellent performance, dominating standard approaches across memory versus accuracy tradeoffs.
View details
Detecting Adversarial Advertisements in the Wild
Michael Pohl
Bridget Spitznagel
John Hainsworth
Yunkai Zhou
Proceedings of the 17th ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, KDD (2011)
Preview abstract
In a large online advertising system, adversaries may attempt to profit from the creation of low quality or harmful
advertisements. In this paper, we present a large scale data
mining effort that detects and blocks such adversarial advertisements for the benefit and safety of our users. Because
both false positives and false negatives have high cost, our
deployed system uses a tiered strategy combining automated
and semi-automated methods to ensure reliable classification. We also employ strategies to address the challenges of
learning from highly skewed data at scale, allocating the effort of human experts, leveraging domain expert knowledge,
and independently assessing the effectiveness of our system.
View details
Predicting Bounce Rates in Sponsored Search Advertisements
Preview
Robert Malkin
Roberto J. Bayardo
Proc. of the 15th International ACM-SIGKDD Conference on Knowledge Discovery and Data Mining, ACM (2009), pp. 1325-1334
Large Scale Learning to Rank
Preview
NIPS 2009 Workshop on Advances in Ranking
Hidden Technical Debt in Machine Learning Systems
Gary Holt
Eugene Davydov
Dietmar Ebner
Vinay Chaudhary
Michael Young
Jean-François Crespo
NIPS (2015), pp. 2503-2511