Jump to Content
Delesley Hutchins

Delesley Hutchins

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length. Our recurrent cell operates on blocks of tokens rather than single tokens, and leverages parallel computation within a block in order to make efficient use of accelerator hardware. The cell itself is strikingly simple. It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens. Our design was inspired in part by LSTM cells, and it uses LSTM-style gates, but it scales the typical LSTM cell up by several orders of magnitude. Our implementation of recurrence has the same cost in both computation time and parameter count as a conventional transformer layer, but offers dramatically improved perplexity in language modeling tasks over very long sequences. Our model out-performs a long-range Transformer XL baseline by a wide margin, while running twice as fast. We demonstrate its effectiveness on PG19 (books), arXiv papers, and GitHub source code. View details
    Memorizing Transformers
    Yuhuai Wu
    Markus Rabe
    Christian Szegedy
    ICLR 2022 (2022)
    Preview abstract Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory up to 262K tokens. On benchmarks including code and mathematics, we find that the model is capable of making use of newly defined functions and theorems during test time. View details
    C/C++ Thread Safety Analysis
    Aaron Ballman
    Dean Sutherland
    2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation, IEEE
    Preview abstract Writing multithreaded programs is hard. Static analysis tools can help developers by allowing threading policies to be formally specified and mechanically checked. They essentially provide a static type system for threads, and can detect potential race conditions and deadlocks. This paper describes Clang Thread Safety Analysis, a tool which uses annotations to declare and enforce thread safety policies in C and C++ programs. Clang is a production-quality C++ compiler which is available on most platforms, and the analysis can be enabled for any build with a simple warning flag: −Wthread−safety. The analysis is deployed on a large scale at Google, where it has provided sufficient value in practice to drive widespread voluntary adoption. Contrary to popular belief, the need for annotations has not been a liability, and even confers some benefits with respect to software evolution and maintenance. View details
    No Results Found