Jump to Content
Ramprasad Venkataraman

Ramprasad Venkataraman

Ram works on large scale distributed software systems for resilient, near-real-time event processing at Google. His interests lie at the confluence of parallel algorithms for high performance computing applications, runtime systems for managing concurrency, scalability and performance. He is excited by trends at both ends of the computing spectrum: from multicore devices to extreme scale top500 supercomputers. Prior to joining Google, Ram worked in the context of scientific and numerical HPC. He has contributed to the Charm++ parallel programming framework, and to petascale computational software for several scientific domains.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages
    Lukasz Wesolowski
    A Gupta
    Jae-Seung Yeom
    Keith Bisset
    Yanhua Sun
    Pritish Jetley
    Thomas Quinn
    Laxmikant Kale
    International Conference on Parallel Processing (2014)
    Preview abstract Fine-grained communication in supercomputing applications often limits performance through high communication overhead and poor utilization of network bandwidth. This paper presents Topological Routing and Aggregation Module (TRAM), a library that optimizes fine-grained communication performance by routing and dynamically combining short messages. TRAM collects units of fine-grained communication from the application and combines them into aggregated messages with a common intermediate destination. It routes these messages along a virtual mesh topology mapped onto the physical topology of the network. TRAM improves network bandwidth utilization and reduces communication overhead. It is particularly effective in optimizing patterns with global communication and large message counts, such as all to-all and many-to-many, as well as sparse, irregular, dynamic or data dependent patterns. We demonstrate how TRAM improves performance through theoretical analysis and experimental verification using benchmarks and scientific applications. We present speedups on petascale systems of 6x for communication benchmarks and up to 4x for applications. View details
    Parallel Branch-and-Bound for Two-Stage Stochastic Integer Optimization
    Akhil Langer
    Udatta Palekar
    Laxmikant Kale
    IEEE International Conference on High Performance Computing (HiPC) (2013), pp. 266 - 275
    OpenAtom: Ab-initio Molecular Dynamics for Petascale Platforms
    Glenn Martyna
    Eric Bohm
    Laxmikant Kale
    Abhinav Bhatele
    Parallel Science and Engineering Applications: The Charm++ Approach, CRC Press (2013), pp. 79-104
    Mapping Dense LU Factorization on Multicore Supercomputer Nodes
    Jonathan Lifflander
    Phil Miller
    Anshu Arya
    T Jones
    Laxmikant Kale
    IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2012), pp. 596 - 606
    Charm++ for Productivity and Performance: A Submission to the 2011 HPC Class II Challenge
    Laxmikant Kale
    et al
    University of Illinois (2011)