Data Mining and Modeling

The proliferation of machine learning means that learned classifiers lie at the core of many products across Google. However, questions in practice are rarely so clean as to just to use an out-of-the-box algorithm. A big challenge is in developing metrics, designing experimental methodologies, and modeling the space to create parsimonious representations that capture the fundamentals of the problem. These problems cut across Google’s products and services, from designing experiments for testing new auction algorithms to developing automated metrics to measure the quality of a road map.

Data mining lies at the heart of many of these questions, and the research done at Google is at the forefront of the field. Whether it is finding more efficient algorithms for working with massive data sets, developing privacy-preserving methods for classification, or designing new machine learning approaches, our group continues to push the boundary of what is possible.

Recent Publications

First Passage Percolation with Queried Hints
Yiheng Shen
Ali Sinop
Kritkorn Karntikoon
Aaron Schild
AISTATS (2024)
Shorts vs. Regular Videos on YouTube: A Comparative Analysis of User Engagement and Content Creation Trends
Caroline Violot
Mathias Humbert
Tugrulcan Elmais
ACM Web Science Conference 2024 (WEBSCI24) (2024)
LinguaMeta: Unified Metadata for Thousands of Languages
Uche Okonkwo
Emily Drummond
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)