Jump to Content
Aleksandra Faust

Aleksandra Faust

Aleksandra Faust is a Senior Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. Previously, Aleksandra founded and led Task and Motion Planning research in Robotics at Google, machine learning for self-driving car planning and controls in Waymo, and was a senior researcher in Sandia National Laboratories. She earned a Ph.D. in Computer Science at the University of New Mexico (with distinction), and a Master's in Computer Science from the University of Illinois at Urbana-Champaign. Her research interests include learning for safe and scalable reinforcement learning, learning to learn, motion planning, decision-making, and robot behavior. Aleksandra won IEEE RAS Early Career Award for Industry, the Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in the period of 2011-2014, and was named Distinguished Alumna by the University of New Mexico School of Engineering. Her work has been featured in the New York Times, PC Magazine, ZdNet, VentureBeat, and ​was awarded Best Paper in Service Robotics at ICRA 2018, Best Paper in Reinforcement Learning for Real Life (RL4RL) at ICML 2019, and Best Paper of IEEE Computer Architecture Letters in 2020.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Multimodal Web Navigation with Instruction-Finetuned Foundation Models
    Hiroki Furuta
    Ofir Nachum
    Yutaka Matsuo
    Shane Gu
    Izzeddin Gur
    International Conference on Learning Representations (ICLR) (2024)
    Preview abstract The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data. In this work, we study data-driven offline training for web agents with vision-language foundation models. We propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages and outputs web navigation actions, such as click and type. WebGUM is trained by jointly finetuning an instruction-finetuned language model and a vision encoder with temporal and local perception on a large corpus of demonstrations. We empirically demonstrate this recipe improves the agent's ability of grounded multimodal perception, HTML comprehension, and multi-step reasoning, outperforming prior works by a significant margin. On the MiniWoB, we improve over the previous best offline methods by more than 45.8%, even outperforming online-finetuned SoTA, humans, and GPT-4-based agent. On the WebShop benchmark, our 3-billion-parameter model achieves superior performance to the existing SoTA, PaLM-540B. Furthermore, WebGUM exhibits strong positive transfer to the real-world planning tasks on the Mind2Web. We also collect 347K high-quality demonstrations using our trained models, 38 times larger than prior work, and make them available to promote future research in this direction. View details
    Preview abstract We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose “Levels of AGI” based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems. View details
    Multi-Task Learning with Sequence-Conditioned Transporter Networks
    Michael Lim
    Andy Zeng
    Brian Andrew Ichter
    Maryam Bandari
    Erwin Johan Coumans
    Claire Tomlin
    Stefan Schaal
    International Conference on Robotics and Automation 2022, IEEE (to appear)
    Preview abstract Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks. View details
    QuaRL: Quantization for Fast and Environmentally Sustainable Reinforcement Learning
    Gabe Barth-Maron
    Maximilian Lam
    Sharad Chitlangia
    Srivatsan Krishnan
    Vijay Janapa Reddi
    Zishen Wan
    Transactions on Machine Learning Research (TMLR) 2022 (2022)
    Preview abstract Deep reinforcement learning continues to show tremendous potential in achieving task-level autonomy, however, its computational and energy demands remain prohibitively high. In this paper, we tackle this problem by applying quantization to reinforcement learning. To that end, we introduce a novel Reinforcement Learning (RL) training paradigm, \textit{ActorQ}, to speed up actor-learner distributed RL training. \textit{ActorQ} leverages 8-bit quantized actors to speed up data collection without affecting learning convergence. Our quantized distributed RL training system, \textit{ActorQ}, demonstrates end-to-end speedups of 1.5 - 2.5 , and faster convergence over full precision training on a range of tasks (Deepmind Control Suite) and different RL algorithms (D4PG, DQN). Furthermore, we compare the carbon emissions (Kgs of CO2) of \textit{ActorQ} versus standard reinforcement learning on various tasks. Across various settings, we show that \textit{ActorQ} enables more environmentally friendly reinforcement learning by achieving 2.8 less carbon emission and energy compared to training RL-agents in full-precision. Finally, we demonstrate empirically that aggressively quantized RL-policies (up to 4/5 bits) enable significant speedups on quantization-friendly (supports native quantization) resource-constrained edge devices, without degrading accuracy. We believe that this is the first of many future works on enabling computationally energy-efficient and sustainable reinforcement learning. The source code for QuaRL is available here for the public to use: \url{https://bit.ly/quarl-tmlr}. View details
    Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization
    Sungryull Sohn
    Hyunjae Woo
    Jongwook Choi
    lyubing Qiang
    Izzeddin Gur
    Honglak Lee
    Uncertainty in Artificial Intelligence (UAI) (2022) (to appear)
    Preview abstract We tackle real-world problems with complex structures beyond the pixel-based game or simulator. We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph that defines a set of subtasks and their dependencies that are unknown to the agent. Different from the previous meta-RL methods trying to directly infer the unstructured task embedding, our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks, and use it as a prior to improve the task inference in testing. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks than various existing algorithms such as meta reinforcement learning, hierarchical reinforcement learning, and other heuristic agents. View details
    Preview abstract We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. Existing generators suffer from poor visual grounding, causing them to rely on language priors and hallucinate objects. Our MARKY-MT5 system addresses this by focusing on visual landmarks; it comprises a first stage landmark detector and a second stage generator -- a multimodal, multilingual, multitask encoder-decoder. To train it, we bootstrap grounded landmark annotations on top of the Room-across-Room (RxR) dataset. Using text parsers, weak supervision from RxR's pose traces, and a multilingual image-text encoder trained on 1.8b images, we identify 1.1m English, Hindi and Telugu landmark descriptions and ground them to specific regions in panoramas. On Room-to-Room, human wayfinders obtain success rates (SR) of 71% following MARKY-MT5's instructions, just shy of their 75% SR following human instructions -- and well above SRs with other generators. Evaluations on RxR's longer, diverse paths obtain 61-64% SRs on three languages. Generating such high-quality navigation instructions in novel environments is a step towards conversational navigation tools and could facilitate larger-scale training of instruction-following agents. View details
    Tiny Robot Learning: Challenges and Directions for Machine Learning in Resource-Constrained Robots
    Sabrina Neuman
    Brian Plancher
    Bardienus Pieter Duisterhof
    Srivatsan Krishnan
    Colby R. Banbury
    Mark Mazumder
    Shvetank Prakash
    Jason Jabbour
    Guido C. H. E. de Croon
    Vijay Janapa Reddi
    IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) special session on Low Power Autonomous Systems (2022) (to appear)
    Preview abstract Machine learning (ML) has become a pervasive tool across computing systems. An emerging application that stress-tests the challenges of ML system design is tiny robot learning, the deployment of ML on resource-constrained low-cost autonomous robots. Tiny robot learning lies at the intersection of embedded systems, robotics, and ML, compounding the challenges of these domains. Tiny robot learning is subject to challenges from size, weight, area, and power (SWAP) constraints; sensor, actuator, and compute hardware limitations; end-to-end system tradeoffs; and a large diversity of possible deployment scenarios. Tiny robot learning requires ML models to be designed with these challenges in mind, providing a crucible that reveals the necessity of holistic ML system design and automated end-to-end design tools for agile development. This paper gives a brief survey of the tiny robot learning space, elaborates on key challenges, and proposes promising opportunities for future work in ML system design. View details
    The Role of Compute in Autonomous Micro Aerial Vehicles: Optimizing for Flight Time and Energy Efficiency
    Behzad Boroujerdian
    Hasan Genc
    Srivatsan Krishnan
    Bardienus Pieter Duisterhof
    Brian Plancher
    Kayvan Mansoorshahi
    Marcelino Almeida
    Wenzhi Cui
    Vijay Janapa Reddi
    ACM Transactions on Computer Systems (TOCS) (2022) (to appear)
    Preview abstract Autonomous and mobile cyber-physical machines are becoming an inevitable part of our future. In particular, Micro Aerial Vehicles (MAVs) have seen a resurgence in activity. With multiple use cases, such as surveillance, search and rescue, package delivery, and more, these unmanned aerial systems are on the cusp of demonstrating their full potential. Despite such promises, these systems face many challenges, one of the most prominent of which is their low endurance caused by their limited onboard energy. Since the success of a mission depends on whether the drone can finish it within such duration and before it runs out of battery, improving both the time and energy associated with the mission are of high importance. Such improvements have traditionally arrived at through the use of better algorithms. But our premise is that more powerful and efficient onboard compute can also address the problem. In this paper, we investigate how the compute subsystem, in a cyber-physical mobile machine, such as a Micro Aerial Vehicle , can impact mission time (time to complete a mission) and energy. Specifically, we pose the question as “what is the role of computing for cyber-physical mobile robots?” We show that compute and motion are tightly intertwined, and as such a close examination of cyber and physical processes and their impact on one another is necessary. We show different “impact paths” through which compute impacts mission metrics and examine them using a combination of analytical models, simulation, micro and end-to-end benchmarking. To enable similar studies, we open sourced MAVBench, our tool-set, which consists of (1) a closed-loop real-time feedback simulator and (2) an end-to-end benchmark suite comprised of state-of-the-art kernels. By combining MAVBench, analytical modeling, and an understanding of various compute impacts, we show up to 2X and 1.8X improvements for mission time and mission energy for two optimization case studies. Our investigations, as well as our optimizations, show that cyber-physical co-design, a methodology with which both the cyber and physical processes/quantities of the robot are developed with consideration of one another, similar to hardware-software co-design, is necessary for arriving at the design of the optimal robot. View details
    Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles
    David Brooks
    Gu-Yeon Wei
    Kshitij Bhardwaj
    Paul Whatmough
    Srivatsan Krishnan
    Vijay Janapa Reddi
    Zishen Wan
    55th IEEE/ACM International Symposium on Microarchitecture®, IEEE (2022) (to appear)
    Preview abstract Building domain-specific accelerators is becoming increasingly paramount to meet the high-performance requirements under stringent power and real-time constraints. However, emerging application domains like autonomous vehicles are complex systems, where the constraints extend beyond just the computing stack. Manually selecting and navigating the design space to design custom and efficient domain-specific SoCs (DSSoC) is tedious and expensive. As such, there is a need for automated DSSoC design methodologies. In this paper, we use agile and autonomous UAVs as a case study for understanding how to automate the design of domain-specific SoCs for autonomous vehicles. Architecting a UAV DSSoC requires considering parameters such as sensor rate, compute throughput, and other physical characteristics (e.g., payload weight, thrust-to-weight ratio) that affect overall performance. Iterating over the many component choices results in a combinatorial explosion of the number of possible combinations: from 10s of thousands to billions, depending on implementation details. To navigate the DSSoC design space efficiently, we introduce \emph{AutoPilot}, a systematic methodology for automatically designing DSSoC for autonomous UAVs. AutoPilot uses machine learning to navigate the large DSSoC design space and automatically select a combination of autonomy algorithm and hardware accelerator while considering the cross-product effect across different UAV components. \autop consistently outperforms general-purpose hardware selections like Xavier NX and Jetson TX2, as well as dedicated hardware accelerators built for autonomous UAVs. DSSoC designs generated by \autop increase the number of missions on average by up to 2.25x, 1.62x and 1.43x for nano, micro, and mini-UAVs, respectively, over baselines. We also discuss how \autop can be extended to other related autonomous vehicles using the same set of principles. View details
    Roofline Model for UAVs: A Bottleneck Analysis Tool for Onboard Compute Characterization of Autonomous Unmanned Aerial Vehicles
    Srivatsan Krishnan
    Zishen Wan
    Kshitij Bhardwaj
    Ninad Jadhav
    Vijay Janapa Reddi
    IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (2022)
    Preview abstract We introduce an early-phase bottleneck analysis and characterization model called the F-1 for designing computing systems that target autonomous Unmanned Aerial Vehicles (UAVs). The model provides insights by exploiting the fundamental relationships between various components in the autonomous UAV, such as sensor, compute, and body dynamics. To guarantee safe operation while maximizing the performance (e.g., velocity) of the UAV, the compute, sensor, and other mechanical properties must be carefully selected or designed. The F-1 model provides visual insights that can aid a system architect in understanding the optimal compute design or selection for autonomous UAVs. The model is experimentally validated using real UAVs, and the error is between 5.1\% to 9.5\% compared to real-world flight tests. An interactive web-based tool for the F-1 model called Skyline is available for free of cost use at: https://bit.ly/skyline-tool View details
    Preview abstract The paper proposes a novel training approach for a neural network to perform switching among an array of computationally generated stochastic optimal feedback controllers. The training is based on the outputs of off-line computed lookup-table metric (LTM) values that store information about individual controller performances. Our study is based on a problem of bicycle kinematic model navigation through a sequence of gates and a more traditional approach to the training is based on kinematic variables (KVs) describing the bicycle-gate relative position. We compare the LTM and KV based training approaches to the navigation problem and find that the LTM training has a faster convergence with less variations than the KV based training. Our results include numerical simulations illustrating the work of the LTM trained neural network switching policy. View details
    Visual Navigation Among Humans With Optimal Control as a Supervisor
    Varun Tolani
    Somil Bansal
    Claire Tomlin
    IEEE Robotics and Automation Letters (RA-L) (2021) (to appear)
    Preview abstract Real world visual navigation requires robots to operate in unfamiliar, human-occupied dynamic environments. Navigation around humans is especially difficult because it requires anticipating their future motion, which can be quite challenging. We propose an approach that combines learning- based perception with model-based optimal control to navigate among humans based only on monocular, first-person RGB images. Our approach is enabled by our novel data-generation tool, HumANav, that allows for photorealistic renderings of indoor environment scenes with humans in them, which are then used to train the perception module entirely in simulation. Through simulations and experiments on a mobile robot, we demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion, generalize to previously unseen environments and human behaviors, and transfer directly from simulation to reality. Videos describing our approach and experiments, as well as a live demo of HumANav are available on the project website. View details
    Avoidance Critical Probabilistic Roadmaps for Motion Planning in Dynamic Environments
    Felipe Felix Arias
    Brian Andrew Ichter
    Nancy M. Amato
    IEEE International Conference on Robotics and Automation (ICRA) (2021) (to appear)
    Preview abstract Motion planning among dynamic obstacles is an essential capability towards navigation in the real-world. Sampling-based motion planning algorithms find solutions by approximating the robot’s configuration space through a graph representation, predicting or computing obstacles’ trajectories, and finding feasible paths via a pathfinding algorithm. In this work, we seek to improve the performance of these subproblems by identifying regions critical to dynamic environment navi- gation and leveraging them to construct sparse probabilistic roadmaps. Motion planning and pathfinding algorithms should allow robots to prevent encounters with obstacles, irrespective of their trajectories, by being conscious of spatial context cues such as the location of chokepoints (e.g., doorways). Thus, we propose a self-supervised methodology for learning to identify regions frequently used for obstacle avoidance from local environment features. As an application of this concept, we leverage a neural network to generate hierarchical probabilistic roadmaps termed Avoidance Critical Probabilistic Roadmaps (ACPRM). These roadmaps contain motion structures that enable efficient obstacle avoidance, reduce the search and planning space, and increase a roadmap’s reusability and coverage. ACPRMs are demonstrated to achieve up to five orders of magnitude improvement over grid-sampling in the multi-agent setting and up to ten orders of magnitude over a competitive baseline in the multi-query setting. View details
    Preview abstract Joint attention — the ability to purposefully coordinate your attention with another person, and mutually attend to the same thing — is an important milestone in human cognitive development. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents’ ability to solve difficult coordination tasks, by helping overcome the problem of exploring the combinatorial multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents’ ability to learn from experts present in their environment, even when performing single-agent tasks. Taken together, these findings suggest that joint attention may be a useful inductive bias for improving multi-agent learning. View details
    Environment Generation for Zero-Shot Compositional Reinforcement Learning
    Izzeddin Gur
    Yingjie Miao
    Jongwook Choi
    Manoj Tiwari
    Honglak Lee
    Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) (2021)
    Preview abstract Many real-world problems are compositional – solving them requires completing interdependent sub-tasks, either in series or in parallel, that can be represented as a dependency graph. Deep reinforcement learning (RL) agents often struggle to learn such complex tasks due to the long time horizons and sparse rewards. To address this problem, we present Compositional Design of Environments (CoDE), which trains a Generator agent to automatically build a series of compositional tasks tailored to the RL agent’s current skill level. This automatic curriculum not only enables the agent to learn more complex tasks than it could have otherwise, but also selects tasks where the agent’s performance is weak, enhancing its robustness and ability to generalize zero-shot to unseen tasks at test-time. We analyze why current environment generation techniques are insufficient for the problem of generating compositional tasks, and propose a new algorithm that addresses these issues. Our results assess learning and generalization across multiple compositional tasks, including the real-world problem of learning to navigate and interact with web pages. We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments. We contribute two new benchmark frameworks for generating compositional tasks, compositional MiniGrid and gMiniWoB for web navigation.CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks. View details
    Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation
    Srivatsan Krishnan
    Behzad Boroujerdian
    William Fu
    Vijay Janapa Reddi
    Machine Learning, vol. 110 (2021), pp. 2501-2540
    Preview abstract We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to 40% longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning. View details
    Evolving Reinforcement Learning Algorithms
    JD Co-Reyes
    Yingjie Miao
    Daiyi Peng
    Sergey Levine
    Honglak Lee
    International Conference on Learning Representations (ICLR) (2021) (to appear)
    Preview abstract We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods. View details
    Tiny Robot Learning (tinyRL) for Source Seeking on a Nano Quadcopter
    Bardienus Pieter Duisterhof
    Srivatsan Krishnan
    Jonathan J. Cruz
    Colby R. Banbury
    William Fu
    Guido C. H. E. de Croon
    Vijay Janapa Reddi
    IEEE International Conference on Robotics and Automation (ICRA) (2021) (to appear)
    Preview abstract We present fully autonomous source seeking onboard a highly constrained nano quadcopter, by contributing application-specific system and observation feature design to enable inference of a deep-RL policy onboard a nano quadcopter. Our deep-RL algorithm finds a high-performance solution to a challenging problem, even in presence of high noise levels and generalizes across real and simulation environments with different obstacle configurations. We verify our approach with simulation and in-field testing on a CrazyFlie using only the cheap and ubiquitous Cortex-M4 microcontroller unit. The results show that by end-to-end application-specific system design, our contribution consumes almost three times less additional power, as compared to competing learning-based navigation approach onboard a nano quadcopter. Thanks to our observation space, which we carefully design within the resource constraints, our solution achieves a 94% success rate in cluttered and randomized test environments, as compared to the previously achieved 80%. We also compare our strategy to a simple finite state machine (FSM), geared towards efficient exploration, and demonstrate that our policy is more robust and resilient at obstacle avoidance as well as up to 70% more efficient in source seeking. To this end, we contribute a cheap and lightweight end-to-end tiny robot learning (tinyRL) solution, running onboard a nano quadcopter, that proves to be robust and efficient in a challenging task. View details
    SparseDice: Imitation Learning for Temporally Sparse Data via Regularization
    Alberto Camacho
    Izzeddin Gur
    Marcin Lukasz Moczulski
    Ofir Nachum
    Unsupervised Reinforcement Learning Workshop, collocated with ICML 2021 (2021)
    Preview abstract Imitation learning learns how to act by observing the behavior of an expert demonstrator. We are concerned with a setting where the demonstrations comprise only a subset of state-action pairs (as opposed to the whole trajectories). Our setup reflects the limitations of real-world problems when accessing the expert data. For example, user logs may contain incomplete traces of behavior, or in robotics non-technical human demonstrators may describe trajectories using only a subset of all state-action pairs. A recent approach to imitation learning via distribution matching, ValueDice, tends to overfit when demonstrations are temporally sparse. We counter the overfitting by contributing regularization losses. Our empirical evaluation with Mujoco benchmarks shows that we can successfully learn from very sparse and scarce expert data. Moreover, (i) the quality of the learned policies is often comparable to those learned with full expert trajectories, and (ii) the number of training steps required to learn from sparse data is similar to the number of training steps when the agent has access to full expert trajectories. View details
    Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation
    Srivatsan Krishnan
    Behzad Boroujerdian
    William Fu
    Vijay Janapa Reddi
    Machine Learning, vol. 110 (2021), 2501–2540
    Preview abstract We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to 40% longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning. View details
    The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines
    Srivatsan Krishnan
    Zishen Wan
    Kshitij Bhardwaj
    Paul Whatmough
    Gu-Yeon Wei
    David Brooks
    Vijay Janapa Reddi
    IEEE Computer architecture letters (CAL), vol. 19 (2020), pp. 38 - 42
    Preview abstract We introduce the “Formula-1” (F-1) roofline model to understand the role of computing in aerial autonomous machines. The model provides insights by exploiting the fundamental relationships between various components in an aerial robot, such as sensor framerate, compute performance, and body dynamics (physics). The model serves as a tool that can aid computer and cyber-physical system architects to understand the optimal design (or selection) of various components in the development of autonomous machines. View details
    Quantized Reinforcement Learning (QuaRL)
    Srivatsan Krishnan
    Sharad Chitlangia
    Maximilian Lam
    Zishen Wan
    Vijay Janapa Reddi
    1st Workshop on Resource-Constrained Machine Learning (ReCoML) @ MLSys (2020) (to appear)
    Preview abstract Recent work has shown that quantization can help reduce the memory, compute, and energy demands of deep neural networks without significantly harming their quality. However, whether these prior techniques, applied traditionally to image-based models, work with the same efficacy to the sequential decision making process in reinforcement learning remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the effects of quantization on various deep reinforcement learning policies with the intent to reduce their computational resource demands. We apply techniques such as post-training quantization and quantization aware training to a spectrum of reinforcement learning tasks (such as {Pong}, {Breakout}, {BeamRider} and more) and training algorithms (such as PPO, A2C, DDPG, and DQN). Across this spectrum of tasks and learning algorithms, we show that policies can be quantized to 6-8 bits of precision without loss of accuracy. We also show that certain tasks and reinforcement learning algorithms yield policies that are more difficult to quantize due to their effect of widening the models' distribution of weights and that quantization aware training consistently improves results over post-training quantization and oftentimes even over the full precision baseline. Finally, we demonstrate real-world applications of quantization for reinforcement learning. We use half-precision training to train a Pong model 50% faster, and we deploy a quantized reinforcement learning based navigation policy to an embedded system, achieving an 18X speedup and a 4X reduction in memory usage over an unquantized policy. View details
    Neural Collision Clearance Estimator for Batched Motion Planning
    Brian Andrew Ichter
    Maryam Bandari
    Edward Lee
    The 14th International Workshop on the Algorithmic Foundations of Robotics (WAFR) (2020)
    Preview abstract We present a neural network collision checking heuristic, ClearanceNet, and a planning algorithm, CN-RRT. ClearanceNet learns to predict separation distance (minimum distance between robot and workspace) with respect to a workspace. CN-RRT then efficiently computes a motion plan by leveraging three key features of ClearanceNet. First, CN-RRT explores the space by expanding multiple nodes at the same time, processing batches of thousands of collision checks. Second, CN-RRT adaptively relaxes its clearance requirements for more difficult problems. Third, to repair errors, CN-RRT shifts its nodes in the direction of ClearanceNet’s gradient and repairs any residual errors with a traditional RRT, thus maintaining theoretical probabilistic completeness guarantees. In configuration spaces with up to 30 degrees of freedom, ClearanceNet achieves 845x speedup over traditional collision detection methods, while CN-RRT accelerates motion planning by up to 42% over a baseline and finds paths up to 36% more efficient. Experiments on an 11 degree of freedom robot in a cluttered environment confirm the method’s feasibility on real robots. View details
    Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous
    Rose E. Wang
    Dennis Lee
    Edward Lee
    Brian Andrew Ichter
    Conference on Robot Learning (CoRL) (2020)
    Preview abstract Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans. View details
    Learned Critical Probabilistic Roadmaps for Robotic Motion Planning
    Brian Ichter
    Edward Schmerling
    Tsang-Wei Edward Lee1
    ICRA (2020)
    Preview abstract Sampling-based motion planning techniques have emerged as an efficient algorithmic paradigm for solving complex motion planning problems. These approaches use a set of probing samples to construct an implicit graph representation of the robot’s state space, allowing arbitrarily accurate representations as the number of samples increases to infinity. In practice, however, solution trajectories only rely on a few critical states, often defined by structure in the state space (e.g., doorways). In this work we propose a general method to identify these critical states via graph-theoretic techniques (betweenness centrality) and learn to predict criticality from only local environment features. These states are then leveraged more heavily via global connections within a hierarchical graph, termed Critical Probabilistic Roadmaps. Critical PRMs are demonstrated to achieve up to three orders of magnitude improvement over uniform sampling, while preserving the guarantees and complexity of sampling-based motion planning. A video is available at https://youtu.be/AYoD-pGd9ms. View details
    Preview abstract Imitation learning is a popular approach for training effective visual navigation policies. However, collecting expert demonstrations for a legged robot is less practical because the robot is hard to control, and it walks slowly and cannot run continuously for a long time. In this work, we propose a zero-shot imitation learning framework for training a visual navigation policy on a legged robot from human demonstration (third-person perspective) only, allowing for more cost-effective data collection with better navigation capability. However, imitation learning from third-person perspective demonstrations raises unique challenges. Human demonstrations are captured with different camera perspectives, therefore, we design a feature disentanglement network~(FDN) that extracts perspective-agnostic state features. We reconstruct missing action labels by either building an inverse model of the robot's dynamics in the feature space and applying it to the demonstrations or developing efficient GUI to label human demonstrations. We take a model-based imitation learning approach for training a visual navigation policy from the perspective-agnostic, action-labeled demonstrations. We show that our framework can learn an effective visual navigation policy for a legged robot, Laikago, from expert demonstrations in both simulated and real-world environments. Our approach is zero-shot as the robot never tries to navigate a certain navigation path in the testing environment before the testing phase. We also justify our framework by performing an ablation study and comparing it with baseline algorithms. View details
    Safe Policy Learning for Continuous Control
    Ofir Nachum
    Mohammad Ghavamzadeh
    Conference on Robot Learning (CoRL) (2020)
    Preview abstract We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through near-safe policies, i.e.,~policies that keep the agent in desirable situations, both during training and at convergence. We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them. Our algorithms can use any standard policy gradient (PG) method, such as deep deterministic policy gradient (DDPG) or proximal policy optimization (PPO), to train a neural network policy, while enforcing near-constraint satisfaction for every policy update by projecting either the policy parameter or the selected action onto the set of feasible solutions induced by the state-dependent linearized Lyapunov constraints. Compared to the existing constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Moreover, in practice our action-projection algorithm often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with the state-of-the-art baselines on several simulated (MuJoCo) tasks, as well as a real-world robot obstacle-avoidance problem, demonstrating their effectiveness in terms of balancing performance and constraint satisfaction. View details
    Long-Range Indoor Navigation with PRM-RL
    Anthony Francis
    Marek Fiser
    Tsang-Wei Lee
    IEEE Transactions on Robotics (T-RO) (2020), pp. 19
    Preview abstract Long-range indoor navigation requires guiding robots with noisy sensors and controls through cluttered environments along paths that span a variety of buildings. We achieve this with PRM-RL, a hierarchical robot navigation method in which reinforcement learning agents that map noisy sensors to robot controls learn to solve short-range obstacle avoidance tasks, and then sampling-based planners map where these agents can reliably navigate in simulation; these roadmaps and agents are then deployed on robots, guiding them along the shortest path where the agents are likely to succeed. Here we use Probabilistic Roadmaps (PRMs) as the sampling-based planner, and AutoRL as the reinforcement learning method in the indoor navigation context. We evaluate the method in simulation for kinematic differential drive and kinodynamic car-like robots in several environments, and on differential-drive robots at three physical sites. Our results show PRM-RL with AutoRL is more successful than several baselines, is robust to noise, and can guide robots over hundreds of meters in the face of noise and obstacles in both simulation and on robots, including over 5.8 kilometers of physical robot navigation. View details
    Fast Deep Swept Volume Estimator
    John E. G. Baxter
    Satomi Sugaya
    Mohammad R. Yousefi
    Lydia Tapia
    The International Journal of Robotics Research (IJRR) (2020) (to appear)
    Preview abstract Despite decades of research on efficient swept volume computation for robotics, computing the exact swept volume is intractable and approximate swept volume algorithms have been computationally prohibitive for applications such as motion and task planning. In this work, we employ Deep Neural Networks (DNNs) for fast swept volume estimation. Since swept volume is a property of robot kinematics, a DNN can be trained off-line once in a supervised manner and deployed in any environment. The trained DNN is fast during on-line swept volume geometry or size inferences. Results show that DNNs can accurately and rapidly estimate swept volumes caused by rotational, translational and prismatic joint motions. Sampling-based planners using the learned distance are up to 5x more efficient and identify paths with smaller swept volumes on simulated and physical robots. Results also show that swept volume geometry estimation with a DNN is over 98.9% accurate and 1200x faster than an octree-based swept volume algorithm. View details
    Learning to Navigate the Web
    Izzeddin Gur
    Dilek Hakkani-Tur
    International Conference on Learning Representations (ICLR) (2019)
    Preview abstract Learning in environments with large state and action spaces, and sparse rewards, can hinder a Reinforcement Learning (RL) agent’s learning through trial-anderror. For instance, following natural language instructions on the Web (such as booking a flight ticket) leads to RL settings where input vocabulary and number of actionable elements on a page can grow very large. Even though recent approaches improve the success rate on relatively simple environments with the help of human demonstrations to guide the exploration, they still fail in environments where the set of possible instructions can reach millions. We approach the aforementioned problems from a different perspective and propose guided RL approaches that can generate unbounded amount of experience for an agent to learn from. Instead of learning from a complicated instruction with a large vocabulary, we decompose it into multiple sub-instructions and schedule a curriculum in which an agent is tasked with a gradually increasing subset of these relatively easier sub-instructions. In addition, when the expert demonstrations are not available, we propose a novel meta-learning framework that generates new instruction following tasks and trains the agent more effectively. We train DQN, deep reinforcement learning agent, with Q-value function approximated with a novel QWeb neural network architecture on these smaller, synthetic instructions. We evaluate the ability of our agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. The QWeb agent outperforms the baseline without using any human demonstration achieving 100% success rate on several difficult environments. View details
    Toward Exploring End-to-End Learning Algorithms for Autonomous Aerial Machines
    Srivatsan Krishnan
    Behzad Boroujerdian
    Vijay Janapa Reddi
    Algorithms and Architectures for Learning in-the-Loop Systems in Autonomous Flight @ ICRA (2019)
    Preview abstract We develop AirLearning, a tool suite for endto-end closed-loop UAV analysis, equipped with a customized yet randomized environment generator in order to expose the UAV with a diverse set of challenges. We take Deep Q networks (DQN) as an example deep reinforcement learning algorithm and use curriculum learning to train a point to point obstacle avoidance policy. While we determine the best policy based on the success rate, we evaluate it under strict resource constraints on an embedded platform such as RasPi 3. Using hardware in the loop methodology, we quantify the policy’s performance with quality of flight metrics such as energy consumed, endurance and the average length of the trajectory. We find that the trajectories produced on the embedded platform are very different from those predicted on the desktop, resulting in up to 26.43% longer trajectories. Quality of flight metrics with hardware in the loop characterizes those differences in simulation, thereby exposing how the choice of onboard compute contributes to shortening or widening of ‘Sim2Real’ gap. View details
    Preview abstract This paper addresses two challenges facing sampling-based kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference. Video: https://youtu.be/dDMVMTOI8KY View details
    Learning Navigation Behaviors End-to-End with AutoRL
    Marek Fiser
    Anthony Francis
    IEEE Robotics and Automation Letters (RA-L), vol. 4 (2019), pp. 2007-2014
    Preview abstract We learn end-to-end point-to-point and path-following navigation behaviors that avoid moving obstacles. These policies receive noisy lidar observations and output robot linear and angular velocities. The policies are trained in small, static environments with AutoRL, an evolutionary automation layer around Reinforcement Learning (RL) that searches for a deep RL reward and neural network architecture with large-scale hyper-parameter optimization. AutoRL first finds a reward that maximizes task completion, and then finds a neural network architecture that maximizes the cumulative of the found reward. Empirical evaluations, both in simulation and on-robot, show that AutoRL policies do not suffer from the catastrophic forgetfulness that plagues many other deep reinforcement learning algorithms, generalize to new environments and moving obstacles, are robust to sensor, actuator, and localization noise, and can serve as robust building blocks for larger navigation tasks. Our path-following and point-to-point policies are respectively 23% and 26% more successful than comparison methods across new environments. Video at: https://youtu.be/0UwkjpUEcbI View details
    Evolving Rewards to Automate Reinforcement Learning
    Anthony Francis
    Dar Mehta
    6th ICML Workshop on Automated Machine Learning (2019)
    Preview abstract Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: https://youtu.be/svdaOFfQyC8. View details
    Preview abstract Deep Reinforcement Learning (RL) has recently emerged as a solution for moving obstacle avoidance. Deep RL learns to simultaneously predict obstacle motions and corresponding avoidance actions directly from robot sensors, even for obstacles with different dynamics models. However, deep RL methods typically cannot guarantee policy convergences, i.e., cannot provide probabilistic collision avoidance guarantees. In contrast, stochastic reachability (SR), a computationally expensive formal method that employs a known obstacle dynamics model, identifies the optimal avoidance policy and provides strict convergence guarantees. The availability of the optimal solution for versions of the moving obstacle problem provides a baseline to compare trained deep RL policies. In this paper, we compare the expected cumulative reward and actions of these policies to SR, and find the following. 1) The state-value function approximates the optimal collision probability well, thus explaining the high empirical performance. 2) RL policies deviate from the optimal significantly thus negatively impacting collision avoidance in some cases. 3) Evidence suggests that the deviation is caused, at least partially, by the actor net failing to approximate the action corresponding to the highest state-action value. View details
    MAVBench: Micro Aerial Vehicle Benchmarking
    Behzad Boroujerdian
    Hasan Genc
    Srivatsan Krishnan
    Wenzhi Cui
    Vijay Janapa Reddi
    2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), IEEE, pp. 894-907
    Preview abstract Abstract—Unmanned Aerial Vehicles (UAVs) are getting closer to becoming ubiquitous in everyday life. Among them, Micro Aerial Vehicles (MAVs) have seen an outburst of attention recently, specifically in the area with a demand for autonomy. A key challenge standing in the way of making MAVs autonomous is that researchers lack the comprehensive understanding of how performance, power, and computational bottlenecks affect MAV applications. MAVs must operate under a stringent power budget, which severely limits their flight endurance time. As such, there is a need for new tools, benchmarks, and methodologies to foster the systematic development of autonomous MAVs. In this paper, we introduce the “MAVBench” framework which consists of a closed-loop simulator and an end-to-end application benchmark suite. A closed-loop simulation platform is needed to probe and understand the intra-system (application data flow) and inter-system (system and environment) interactions in MAV applications to pinpoint bottlenecks and identify opportunities for hardware and software co-design and optimization. In addition to the simulator, MAVBench provides a benchmark suite, the first of its kind, consisting of a variety of MAV applications designed to enable computer architects to perform characterization and develop future aerial computing systems. Using our open source, end-to-end experimental platform, we uncover a hidden, and thus far unexpected compute to total system energy relationship in MAVs. Furthermore, we explore the role of compute by presenting three case studies targeting performance, energy and reliability. These studies confirm that an efficient system design can improve MAV’s battery consumption by up to 1.8X. View details
    Why Compute Matters for UAV Energy Efficiency?
    Behzad Boroujerdian
    Hasan Genc
    Srivatsan Krishnan
    Vijay Janapa Reddi
    International Symposium on Aerial Robotics (2018)
    Preview abstract Unmanned Aerial Vehicles (UAVs) are getting closer to becoming ubiquitous in everyday life. Although the researchers in the robotic domain have made rapid progress in recent years, hardware and software architects in the computer architecture community lack the comprehensive understanding of how performance, power, and computational bottlenecks affect UAV applications. Such an understanding enables system architects to design microchips tailored for aerial agents. This paper is an attempt by computer architects to initiate the discussion between the two academic domains by investigating the underlying compute systems’ impact on aerial robotic applications. To do so, we identify performance and energy constraints and examine the impact of various compute knobs such as processor cores and frequency on these constraints. Our experiment show that such knobs allow for up to 5X speed up for a wide class of applications. View details
    Deep Neural Networks for Swept Volume Prediction Between Configurations
    Hao-Tien Chiang
    Lydia Tapia
    Third Workshop on Machine Learning in Planning and Control of Robot Motion at ICRA (2018)
    Preview abstract Swept Volume (SV), the volume displaced by an object when it is moving along a trajectory, is considered a useful metric for motion planning. First, SV has been used to identify collisions along a trajectory, because it directly measures the amount of space required for an object to move. Second, in sampling-based motion planning SV is as an excellent distance metric, because it correlates to the likelihood of success of the expensive local planning step between two sampled configurations. However, in both of these applications, traditional SV algorithms are too computationally expensive for efficient motion planning. In this work, we train Deep Neural Networks (DNNs) to learn the size of SV for specific robot geometries. Results for two robots, a 6 degree of freedom (DOF) rigid body and a 7 DOF fixed-based manipulator, indicate that the network estimations are very close to the true size of SV and is more than 1500 times faster than a state of the art SV estimation algorithm. View details
    Fast Swept Volume Estimation with Deep Learning
    Satomi Sugaya
    Lydia Tapia
    The 13th International Workshop on the Algorithmic Foundations of Robotics (WAFR) (2018)
    Preview abstract Swept volume, the volume displaced by a moving object, is an ideal distance metric for sampling-based motion planning because it directly correlates to the amount of motion between two states. However, even approximate algorithms are computationally prohibitive. Our fundamental approach is the application of deep learning to efficiently estimate swept volume computation within a 5%-10% error for all robots tested, from rigid bodies to manipulators. However, even inference via the trained network can be computationally costly given the often hundreds of thousands of computations required by sampling-based motion planning. To address this, we demonstrate an efficient hierarchical approach for applying our trained estimator. This approach first pre-filters samples using a weighted Euclidean estimator trained via swept volume. Then, it selectively applies the deep neural network estimator. The first estimator, although less accurate, has metric space properties. The second estimator is a high-fidelity unbiased estimator without metric space properties. We integrate the hierarchical selection approach in both roadmap-based and a tree-based sampling motion planners. Empirical evaluation on the robot set demonstrates that hierarchal application of the metrics yields up to 5000 times faster planning than state of the art swept volume approximation and up to five times higher probability of finding a collision-free trajectory under a fixed time budget than the traditional Euclidean metric. View details
    PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
    Oscar Ramirez
    Marek Fiser
    Ken Oslund
    Anthony Francis
    James Davidson
    Lydia Tapia
    IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia (2018), pp. 5113-5120
    Preview abstract We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling-based path planning with reinforcement learning (RL) agents. The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints with- out knowledge of the large-scale topology, while the sampling-based planners provide an approximate map of the space of possible configurations of the robot from which collision- free trajectories feasible for the RL agents can be identified. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. These evaluations included both simulated environments and on-robot tests. Our results show improvement in navigation task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM- RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an environment 63 million times larger than used in training. View details
    FollowNet: Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning
    Pararth Shah
    Marek Fiser
    Dilek Hakkani-Tur
    Third Machine Learning in Planning and Control of Robot Motion Workshop at ICRA (2018)
    Preview abstract Abstract— Understanding and following directions provided by humans can enable robots to navigate effectively in unknown situations. We present FollowNet, an end-to-end differentiable neural architecture for learning multi-modal navigation poli- cies. FollowNet maps natural language instructions as well as visual and depth inputs to locomotion primitives. Fol- lowNet processes instructions using an attention mechanism conditioned on its visual and depth input to focus on the relevant parts of the command while performing the navigation task. Deep reinforcement learning (RL) a sparse reward learns simultaneously the state representation, the attention function, and control policies. We evaluate our agent on a dataset of complex natural language directions that guide the agent through a rich and realistic dataset of simulated homes. We show that the FollowNet agent learns to execute previously unseen instructions described with a similar vocabulary, and successfully navigates along paths not encountered during training. The agent shows 30% improvement over a baseline model without the attention mechanism, with 52% success rate at novel instructions. View details
    Resilient Computing with Reinforcement Learning on a Dynamical System: Case study in Sorting
    Brad Aimone
    Conrad James
    Lydia Tapia
    57th IEEE Conference on Decision and Control (2018)
    Preview abstract This paper poses general computation as a feedback-control problem. This formulation allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing is then posed as a sequential decision making problem, solved with RL, and analyzed with Lyapunov stability theory to assess agent's progression to the goal and resilience. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort. View details
    Preview abstract Robot motion planning often requires finding trajectories that balance different user intents, or preferences. One of these preferences is usually arrival at the goal, while another might be obstacle avoidance. Here, we formalize these, and similar, tasks as preference balancing tasks (PBTs) on acceleration controlled robots, and propose a motion planning solution, PrEference Appraisal Reinforcement Learning (PEARL). PEARL uses reinforcement learning on a restricted training domain, combined with features engineered from usergiven intents. PEARL’s planner then generates trajectories in expanded domains for more complex problems. We present an adaptation for rejection of stochastic disturbances and offer indepth analysis, including task completion conditions and behavior analysis when the conditions do not hold. PEARL is evaluated on five problems, two multi-agent obstacle avoidance tasks and three that stochastically disturb the system at run-time: 1) a multiagent pursuit problem with 1000 pursuers, 2) robot navigation through 900 moving obstacles, which is is trained with in an environment with only 4 static obstacles, 3) aerial cargo delivery, 4) two robot rendezvous, and 5) flying inverted pendulum. Lastly, we evaluate the method on a physical quadrotor UAV robot with a suspended load influenced by a stochastic disturbance. View details
    No Results Found