Jump to Content
Slav Petrov

Slav Petrov

Slav Petrov is a Distinguished Scientist / Senior Research Director at Google leading a globally distributed team that conducts natural language understanding and machine learning research. His work has been recognized with multiple Best Paper Awards (ACL'11, NAACL'12, ACL'16) and provides better language understanding to billions of users in a variety of Google products spanning Web Search, Assistant, Ads, Translate & Chrome. Slav is the recipient of the 2014 John Atanasoff Award by the President of Bulgaria and a World Champion at RoboCup 2004. For many years, Slav taught Statistical Natural Language Processing at New York University. He holds a PhD from the University of California at Berkeley.

Slav has spent roughly equal parts of his life in Bulgaria, Germany and the US. Whenever Bulgaria plays Germany in soccer, he supports Bulgaria.

See also my personal webpage for more information (including presentation slides).
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies. View details
    Preview abstract Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?). View details
    PaLM: Scaling Language Modeling with Pathways
    Sharan Narang
    Jacob Devlin
    Maarten Bosma
    Hyung Won Chung
    Sebastian Gehrmann
    Parker Schuh
    Sasha Tsvyashchenko
    Abhishek Rao
    Yi Tay
    Noam Shazeer
    Nan Du
    Reiner Pope
    James Bradbury
    Guy Gur-Ari
    Toju Duke
    Henryk Michalewski
    Xavier Garcia
    Liam Fedus
    David Luan
    Barret Zoph
    Ryan Sepassi
    David Dohan
    Shivani Agrawal
    Mark Omernick
    Marie Pellat
    Aitor Lewkowycz
    Erica Moreira
    Rewon Child
    Oleksandr Polozov
    Zongwei Zhou
    Michele Catasta
    Jason Wei
    arxiv:2204.02311 (2022)
    Preview abstract Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. View details
    Preview abstract Large pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode correlations undesired in many applications, like \emph{surgeon} being associated more with \emph{he} than \emph{she}. We explore such \emph{gendered correlations} as a case study, to learn how we can configure and train models to mitigate the risk of encoding unintended associations. We find that it is important to define correlation metrics, since they can reveal differences among models with similar accuracy. Large models have more capacity to encode gendered correlations, but this can be mitigated with general dropout regularization. Counterfactual data augmentation is also effective, and can even reduce correlations not explicitly targeted for mitigation, potentially making it useful beyond gender too. Both techniques yield models with comparable accuracy to unmitigated analogues, and still resist re-learning correlations in fine-tuning. View details
    Natural Questions: a Benchmark for Question Answering Research
    Olivia Redfield
    Danielle Epstein
    Illia Polosukhin
    Matthew Kelcey
    Jacob Devlin
    Llion Jones
    Ming-Wei Chang
    Jakob Uszkoreit
    Transactions of the Association of Computational Linguistics (2019) (to appear)
    Preview abstract We present the Natural Questions corpus, a question answering dataset. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature. View details
    Universal Semantic Parsing
    Siva Reddy
    Oscar Tackstrom
    Mark Steedman
    Mirella Lapata
    Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)
    Preview
    Preview abstract The aim of this document is to provide a list of dependency tags that are to be used for the Arabic dependency annotation task, with examples provided for each tag. The dependency representation is a simple description of the grammatical relationships in a sentence. It represents all sentence relations uniformly typed as dependency relations. The dependencies are all binary relations between a governor (also known the head) and a dependant (any complement of or modifier to the head). View details
    CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
    Daniel Zeman
    Martin Popel
    Milan Straka
    Jan Hajic
    Joakim Nivre
    Filip Ginter
    Juhani Luotolahti
    Sampo Pyysalo
    Martin Potthast
    Francis Tyers
    Elena Badmaeva
    Memduh Gokirmak
    Anna Nedoluzhko
    Silvie Cinkova
    Jan Hajic jr.
    Jaroslava Hlavacova
    Václava Kettnerová
    Zdenka Uresova
    Jenna Kanerva
    Stina Ojala
    Anna Missilä
    Christopher D. Manning
    Sebastian Schuster
    Siva Reddy
    Dima Taji
    Nizar Habash
    Herman Leung
    Marie-Catherine de Marneffe
    Manuela Sanguinetti
    Maria Simi
    Hiroshi Kanayama
    Valeria de Paiva
    Kira Droganova
    Héctor Martínez Alonso
    Çagrı Çöltekin
    Umut Sulubacak
    Hans Uszkoreit
    Vivien Macketanz
    Aljoscha Burchardt
    Kim Harris
    Katrin Marheinecke
    Georg Rehm
    Tolga Kayadelen
    Ali Elkahky
    Zhuoran Yu
    Emily Pitler
    Saran Lertpradit
    Michael Mandl
    Jesse Kirchner
    Hector Fernandez Alcalde
    Esha Banerjee
    Antonio Stella
    Atsuko Shimada
    Sookyoung Kwak
    Gustavo Mendonca
    Tatiana Lando
    Rattima Nitisaroj
    Josie Li
    Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
    Preview
    Natural Language Processing with Small Feed-Forward Networks
    Jan A. Botha
    Emily Pitler
    Anton Bakalov
    Alex Salcianu
    Ryan Mcdonald
    Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, 2879–2885
    Preview abstract We show that small and shallow feedforward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget. View details
    Preview abstract We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to local normalization: a key insight is that the label bias problem implies that globally normalized models can be strictly more expressive than locally normalized models. View details
    Universal Dependencies v1: A Multilingual Treebank Collection
    Joakim Nivre
    Marie-Catherine de Marneffe
    Filip Ginter
    Yoav Goldberg
    Jan Hajic
    Christopher D. Manning
    Ryan McDonald
    Sampo Pyysalo
    Natalia Silveira
    Reut Tsarfaty
    Daniel Zeman
    Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
    Preview
    Structured Training for Neural Network Transition-Based Parsing
    Proceedings of the 53th Annual Meeting of the Association for Computational Linguistics (ACL '15) (2015)
    Preview
    Grammar as a Foreign Language
    Lukasz Kaiser
    Terry Koo
    Ilya Sutskever
    Geoffrey Hinton
    NIPS (2015)
    Preview abstract Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation. View details
    Improved Transition-Based Parsing and Tagging with Neural Networks
    Greg Coppola
    Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP '15)
    Preview
    Learning Compact Lexicons for CCG Semantic Parsing
    Yoav Artzi
    Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP '14)
    Preview
    Enhanced Search with Wildcards and Morphological Inflections in the Google Books Ngram Viewer
    Jason Mann
    David Zhang
    Lu Yang
    Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics (Demonstrations), Association for Computational Linguistics (2014)
    Preview
    Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
    Oscar Tackstrom
    Ryan McDonald
    Joakim Nivre
    Transactions of the Association for Computational Linguistics (2013), 1–-12
    Preview
    Universal Dependency Annotation for Multilingual Parsing
    Ryan McDonald
    Joakim Nivre
    Yoav Goldberg
    Yvonne Quirmbach-Brundage
    Keith Hall
    Oscar Tackstrom
    Claudia Bedini
    Nuria Bertomeu Castello
    Jungmee Lee
    Association for Computational Linguistics, Association for Computational Linguistics (2013)
    Preview
    Source-Side Classifier Preordering for Machine Translation
    Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP '13) (2013)
    Preview abstract We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations. View details
    Syntactic Annotations for the Google Books Ngram Corpus
    Yuri Lin
    Jean-Baptiste Michel
    Erez Lieberman Aiden
    William Brockman
    Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 2: Demo Papers (ACL '12) (2012)
    Preview
    Vine Pruning for Efficient Multi-Pass Dependency Parsing
    Alexander Rush
    The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL '12), Best Paper Award
    Preview abstract Coarse-to-fine inference has been shown to be a robust approximate method for improving the efficiency of structured prediction models while preserving their accuracy. We propose a multi-pass coarse-to-fine architecture for dependency parsing using linear-time vine pruning and structured prediction cascades. Our first-, second-, and third-order models achieve accuracies comparable to those of their unpruned counterparts, while exploring only a fraction of the search space. We observe speed-ups of up to two orders of magnitude compared to exhaustive search. Our pruned third-order model is twice as fast as an unpruned first-order model and also compares favorably to a state-of-the-art transition-based parser for multiple languages. View details
    A Universal Part-of-Speech Tagset
    Ryan McDonald
    Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC '12) (2012)
    Preview
    Google's Hybrid Approach to Research
    Alfred Spector
    Communications of the ACM, vol. 55 Issue 7 (2012), pp. 34-37
    Preview abstract In this viewpoint, we describe how we organize computer science research at Google. We focus on how we integrate research and development and discuss the benefits and risks of our approach. View details
    Using Search-Logs to Improve Query Tagging
    Keith B. Hall
    Ryan McDonald
    Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers (ACL '12) (2012)
    Preview
    Overview of the 2012 Shared Task on Parsing the Web
    Ryan McDonald
    Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL) (2012)
    Preview
    Training Structured Prediction Models with Extrinsic Loss Functions
    Keith Hall
    Ryan McDonald
    Domain Adaptation Workshop at NIPS 2011
    Preview
    Training a Parser for Machine Translation Reordering
    Jason Katz-Brown
    Ryan McDonald
    Franz Och
    David Talbot
    Hiroshi Ichikawa
    Masakazu Seno
    Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11)
    Preview abstract We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress. View details
    Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
    Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11) (2011), Best Paper Award
    Preview abstract We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable for a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as constraints in an unsupervised model. Across six European languages, our approach results in an average absolute improvement of 9.7\% over the state-of-the-art baseline, and 17.0\% over vanilla hidden Markov models induced with EM. View details
    Multi-Source Transfer of Delexicalized Dependency Parsers
    Ryan McDonald
    Keith B. Hall
    Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11)
    Preview
    Efficient Parallel CKY Parsing on GPUs
    Youngmin Yi
    Chao-Yue Lai
    Kurt Keutzer
    Proceedings of the International Conference on Parsing Technologies (IWPT '11) (2011)
    Preview
    Self-training with Products of Latent Variable Grammars
    Zhongqiang Huang
    Mary Harper
    Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
    Preview
    Learning Better Monolingual Models with Unannotated Bilingual Text
    David Burkett
    Dan Klein
    Fourteenth Conference on Computational Natural Language Learning (CoNLL '10) (2010)
    Preview
    Uptraining for Accurate Deterministic Question Parsing
    Michael Ringgaard
    Hiyan Alshawi
    Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
    Preview
    Products of Random Latent Variable Grammars
    Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL/HLT '10) (2010)
    Preview
    Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
    Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing (EMNLP '10)
    Preview
    Generative and Discriminative Latent Variable Grammars
    The Generative and Discriminative Learning Interface Workshop at NIPS 2009
    Preview
    Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs
    Alexandre Bouchard-Côté
    Dan Klein
    Advances in Neural Information Processing Systems 22 (NIPS '09) (2009)
    Preview
    Coarse-to-Fine Natural Language Processing
    Ph.D. Thesis, University of California at Berkeley (2009)
    Coarse-to-Fine Syntactic Machine Translation using Language Projections
    Aria Haghighi
    Dan Klein
    Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Honolulu, Hawaii, pp. 108-116
    Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing
    Dan Klein
    Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Honolulu, Hawaii, pp. 867-876
    Parsing German with Latent Variable Grammars
    Dan Klein
    Proceedings of the Workshop on Parsing German at ACL '08, Association for Computational Linguistics, Columbus, Ohio (2008), pp. 33-39
    Efficient Sentence Segmentation Using Syntactic Features
    Benoit Favre
    Dilek Hakkani-Tür
    Dan Klein
    Spoken Language Technologies (SLT), Goa, India (2008)
    Discriminative Log-Linear Grammars with Latent Variables
    Dan Klein
    Advances in Neural Information Processing Systems 20 (NIPS), MIT Press, Cambridge, MA (2008), pp. 1153-1160
    The Infinite PCFG Using Hierarchical Dirichlet Processes
    Percy Liang
    Michael Jordan
    Dan Klein
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 688-697
    Improved Inference for Unlexicalized Parsing
    Dan Klein
    Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, Association for Computational Linguistics, Rochester, New York, pp. 404-411
    Learning and Inference for Hierarchically Split PCFGs
    Dan Klein
    AAAI 2007 (Nectar Track)
    Learning Structured Models for Phone Recognition
    Adam Pauls
    Dan Klein
    Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 897-905
    Learning Accurate, Compact, and Interpretable Tree Annotation
    Leon Barrett
    Romain Thibaux
    Dan Klein
    Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (ACL/COLING), Association for Computational Linguistics, Sydney, Australia (2006), pp. 433-440
    Detecting Categories in News Video using Acoustic, Speech and Image Features
    Arlo Faria
    Pascal Michaillat
    Alexander Berg
    Andreas Stolcke
    Dan Klein
    Jitendra Malik
    Proceedings of (VIDEO) TREC (TrecVid 2006)
    Non-Local Modeling with a Mixture of PCFGs
    Leon Barrett
    Dan Klein
    Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), Association for Computational Linguistics, New York City (2006), pp. 14-20
    3D Tracking = Classification + Interpolation
    Carlo Tomasi
    Arvind Sastry
    Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV) (2003)