Jump to Content
Jacques Pienaar

Jacques Pienaar

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    MLIR: Scaling Compiler Infrastructure for Domain Specific Computation
    Chris Lattner
    Mehdi Amini
    Uday Bondhugula
    River Riddle
    Tatiana Shpeisman
    Nicolas Vasilache
    CGO 2021
    Preview abstract This work presents the MLIR compiler infrastructure, which is a novel approach to building reusable compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduces the cost of building domain specific compilers, and aid in connecting existing compilers together. MLIR facilitates the design and implementation of code generators, translators and optimizer at different levels of abstraction and also across application domains, hardware targets and execution environments. The scientific perspective on these challenges is twofold: 1) evaluating MLIR as an infrastructure that enables new research and educational approaches on programming languages, compilers, code generators, execution environments, hardware acceleration and codesign; and 2) discussing MLIR as a research artifact built for extension and evolution, raising its own design, semantics, algorithmic, system, engineering, and multi-disciplinary challenges. The paper presents the rationale for MLIR, its original design principles, structures and semantics, and validates these by surveying some applications of it. View details
    Preview abstract One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover more optimal tensor computation graphs if we search for a better sequence of substitutions instead of relying on heuristics. However, we observe that existing approaches for tensor graph superoptimization both in production and research frameworks apply substitutions in a sequential manner. Such sequential search methods are sensitive to the order in which the substitutions are applied and often only explore a small fragment of the exponential space of equivalent graphs. This paper presents a novel technique for tensor graph superoptimization that employs equality saturation to apply all possible substitutions at once. We show that our approach can find optimized graphs with up to 16% speedup over state-of-the-art, while spending on average 48x less time optimizing. View details
    Preview abstract MLIR - Multi-Level Intermediate Representation compiler infrastructure - was announced at C4ML last year and has since become an official LLVM subproject. It continues to grow as an open community of academic and industry collaborators building common infrastructure for compilers that operate on high level abstractions. In this talk I will focus on uses of MLIR specific to Machine Learning and in particular the TensorFlow ecosystem. This talk will go beyond current TensorFlow usage of MLIR, covering infrastructure work to enable future use cases (such as, ongoing work on dynamic pattern rewrites) and highlighting some community-driven efforts (in particular the tensor compute working group). View details
    Preview abstract The growing diversity of domain-specific accelerators spans all scales from mobile devices to data centers. It constitutes a global challenge across the high-performance computing stack and is particularly visible in the field of Machine Learning (ML). Program representations and compilers need to support a variety of devices at multiple levels of abstraction, from scalar instructions to coarse-grain parallelism and large scale distribution of computation graphs. This puts great pressure on the construction of both generic and target-specific optimizations, with domain specific language support, interfaces with legacy and future infrastructure, and special attention to future-proofness, modularity and code reuse. This motivates the construction of a new infrastructure, unifying graph representations, ML operators, optimizations at different levels and also across levels, targets, ML frameworks, training and inference, quantization, tightly interacting with runtime systems. Compilers are expected to readily support new applications, to easily port to new hardware, to bridge many levels of abstraction from dynamic, managed languages to vector accelerators and software-managed memories, while exposing high level knobs for autotuning, enable just-in-time operation, provide diagnostics and propagate functional and performance debugging information across the entire stack, and delivering performance close enough to hand-written assembly in most cases. We will share our vision, progress and plans towards the design and public release of such a compiler infrastructure. View details
    Optimization Space Pruning without Regrets
    Ulysse Beaugnon
    Antoine Pouille
    Marc Pouzet
    Proceedings of the 26th International Conference on Compiler Construction, ACM, Austin, TX, USA (2017)
    Preview abstract Many computationally-intensive algorithms benefit from the wide parallelism offered by Graphical Processing Units (GPUs). However, the search for a close-to-optimal implementation remains extremely tedious due to the specialization and complexity of GPU architectures. We present a novel approach to automatically discover the best performing code from a given set of possible implementations. It involves a branch and bound algorithm with two distinctive features: (1) an analytic performance model of a lower bound on the execution time, and (2) the ability to estimate such bounds on a partially-specified implementation. The unique features of this performance model allow to aggressively prune the optimization space without eliminating the best performing implementation. While the space considered in this paper focuses on GPUs, the approach is generic enough to be applied to other architectures. We implemented our algorithm in a tool called Telamon and demonstrate its effectiveness on a huge, architecture-specific and input-sensitive optimization space. The information provided by the performance model also helps to identify ways to enrich the search space to consider better candidates, or to highlight architectural bottlenecks. View details
    GPUCC - An Open-Source GPGPU Compiler
    Jingyue Wu
    Mark Heffernan
    Chris Leary
    Bjarke Roune
    Rob Springer
    Xuetian Weng
    Proceedings of the 2016 International Symposium on Code Generation and Optimization, ACM, New York, NY, pp. 105-116
    Preview abstract Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA’s CUDA and the cross-platform OpenCL standard. Until now, there has not been a fully open-source compiler targeting the CUDA environment, hampering general compiler and architecture research and making deployment difficult in datacenter or supercomputer environments. In this paper, we present gpucc, an LLVM-based, fully open-source, CUDA compatible compiler for high performance computing. It performs various general and CUDA-specific optimizations to generate high performance code. The Clang-based frontend supports modern language features such as those in C++11 and C++14. Compile time is 8% faster than NVIDIA’s toolchain (nvcc) and it reduces compile time by up to 2.4x for pathological compilations (>100 secs), which tend to dominate build times in parallel build environments. Compared to nvcc, gpucc’s runtime performance is on par for several open-source benchmarks, such as Rodinia (0.8% faster), SHOC (0.5% slower), or Tensor (3.7% faster). It outperforms nvcc on internal large-scale end-to-end benchmarks by up to 51.0%, with a geometric mean of 22.9%. View details
    JSWhiz - Static Analysis for JavaScript Memory Leaks
    Proceedings of the 10th annual IEEE/ACM international symposium on Code generation and optimization, IEEE (2013)
    Preview abstract JavaScript is the dominant language for implementing dynamic web pages in browsers. Even though it is standardized, many browsers implement language and browser bindings in different and incompatible ways. As a result, a plethora of web development frameworks were developed to hide cross-browser issues and to ease development of large web applications. An unwelcome side-effect of these frameworks is that they can introduce memory leaks, despite the fact that JavaScript is garbage collected. Memory bloat is a major issue for web applications, as it affects user perceived latency and may even prevent large web applications from running on devices with limited resources. In this paper we present JSWhiz, an extension to the open-source Closure JavaScript compiler. Based on experiences analyzing memory leaks in Gmail, JSWhiz detects five identified common problem patterns. JSWhiz found a total of 89 memory leaks across Google's Gmail, Docs, Spreadsheets, Books, and Closure itself. It contributed significantly in a recent effort to reduce Gmail memory footprint, which resulted in bloat reduction of 75% at the 99th percentile, and by roughly 50% at the median. View details
    No Results Found