Vikas Sindhwani
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
Agile Catching with Whole-Body MPC and Blackbox Policy Learning
Saminda Abeyruwan
Nick Boffi
Anish Shankar
Jean-Jacques Slotine
Stephen Tu
Learning for Dynamics and Control (2023)
Preview abstract
We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a
challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching.
View details
Single-Level Differentiable Contact Simulation
Simon Le Cleac'h
Mac Schwager
Zachary Manchester
IEEE RAL (2023)
Preview abstract
We present a differentiable formulation of rigid-body contact dynamics for objects and robots represented as compositions of convex primitives. Existing optimization-based approaches simulating contact between convex primitives rely on a bilevel formulation that separates collision detection and contact simulation. These approaches are unreliable in realistic contact simulation scenarios because isolating the collision detection problem introduces contact location non-uniqueness. Our approach combines contact simulation and collision detection into a unified single-level optimization problem. This disambiguates the collision detection problem in a physics-informed manner. Compared to previous differentiable simulation approaches, our formulation features improved simulation robustness and computational complexity improved by more than an order of magnitude. We provide a numerically efficient implementation of our formulation in the Julia language called \href{https://github.com/simon-lc/DojoLight.jl}{DojoLight.jl}.
View details
Robotic Table Tennis: A Case Study into a High Speed Learning System
Jon Abelian
Saminda Abeyruwan
Michael Ahn
Justin Boyd
Erwin Johan Coumans
Omar Escareno
Wenbo Gao
Navdeep Jaitly
Juhana Kangaspunta
Satoshi Kataoka
Gus Kouretas
Yuheng Kuang
Corey Lynch
Thinh Nguyen
Ken Oslund
Barney J. Reed
Anish Shankar
Avi Singh
Grace Vesom
Peng Xu
Robotics: Science and Systems (2023)
Preview abstract
We present a deep-dive into a learning robotic system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized and novel perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description including numerous design decisions that are typically not widely disseminated, with a collection of ablation studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, and sensitivity to policy hyper-parameters and choice of action space. A video demonstrating the components of our system and details of experimental results is included in the supplementary material.
View details
Robotic table wiping via whole-body trajectory optimizationand reinforcement learning
Benjie Holson
Jeffrey Bingham
Jonathan Weisz
Mario Prats
Peng Xu
Thomas Lew
Xiaohan Zhang
Yao Lu
ICRA (2022)
Preview abstract
We propose an end-to-end framework to enablemultipurpose assistive mobile robots to autonomously wipetables and clean spills and crumbs. This problem is chal-lenging, as it requires planning wiping actions with uncertainlatent crumbs and spill dynamics over high-dimensional visualobservations, while simultaneously guaranteeing constraintssatisfaction to enable deployment in unstructured environments.To tackle this problem, we first propose a stochastic differentialequation (SDE) to model crumbs and spill dynamics and ab-sorption with the robot wiper. Then, we formulate a stochasticoptimal control for planning wiping actions over visual obser-vations, which we solve using reinforcement learning (RL). Wethen propose a whole-body trajectory optimization formulationto compute joint trajectories to execute wiping actions whileguaranteeing constraints satisfaction. We extensively validateour table wiping approach in simulation and on hardware.
View details
Trajectory Optimization with Optimization-Based Dynamics
Taylor Howell
Simon Le Cleac'h
Zachary Manchester
ICRA (2022)
Preview abstract
We present a framework for bi-level trajectory optimization in which a system's dynamics are encoded as the solution to a constrained optimization problem and smooth gradients of this lower-level problem are passed to an upper-level trajectory optimizer. This optimization-based dynamics representation enables constraint handling, additional variables, and non-smooth behavior to be abstracted away from the upper-level optimizer, and allows classical unconstrained optimizers to synthesize trajectories for more complex systems. We provide a path-following method for efficient evaluation of constrained dynamics and utilize the implicit-function theorem to compute smooth gradients of this representation. We demonstrate the framework by modeling systems from locomotion, aerospace, and manipulation domains including: acrobot with joint limits, cart-pole subject to Coulomb friction, Raibert hopper, rocket landing with thrust limits, and planar-push task with optimization-based dynamics and then optimize trajectories using iterative LQR.
View details
Hybrid Random Features
Haoxian Chen
Han Lin
Yuanzhe Ma
Arijit Sehanobish
Michael Ryoo
Jake Varley
Valerii Likhosherstov
Dmitry Kalashnikov
Adrian Weller
International Conference on Learning Representations (ICLR) (2022)
Preview abstract
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi & Recht, 2007) or (recently introduced in the context of linear-attention Transformers) positive random features (Choromanski et al., 2021b). By generalizing Bochner’s Theorem for softmax/Gaussian kernels and leveraging random features for compositional kernels, the HRF-mechanism provides strong theoretical guarantees - unbiased approximation and strictly smaller worst-case relative errors than its counterparts. We conduct exhaustive empirical evaluation of HRF ranging from pointwise kernel estimation experiments, through tests on data admitting clustering structure to benchmarking implicit-attention Transformers (also for downstream Robotics applications), demonstrating its quality in a wide spectrum of machine learning problems.
View details
Preview abstract
Indirect trajectory optimization methods such as Differential Dynamic Programming (DDP) have found considerable success when only planning under dynamic feasibility constraints. Meanwhile, nonlinear programming (NLP) has been the state-of-the-art approach when faced with additional constraints (e.g., control bounds, obstacle avoidance). However, a na{\"i}ve implementation of NLP algorithms, e.g., shooting-based sequential quadratic programming (SQP), may suffer from slow convergence -- caused from natural instabilities of the underlying system manifesting as poor numerical stability within the optimization. Re-interpreting the DDP closed-loop rollout policy as a \emph{sensitivity-based correction to a second-order search direction}, we demonstrate how to compute analogous closed-loop policies (i.e., feedback gains) for \emph{constrained} problems. Our key theoretical result introduces a novel dynamic programming-based constraint-set recursion that augments the canonical ``cost-to-go" backward pass. On the algorithmic front, we develop a hybrid-SQP algorithm incorporating DDP-style closed-loop rollouts, enabled via efficient \emph{parallelized} computation of the feedback gains. Finally, we validate our theoretical and algorithmic contributions on a set of increasingly challenging benchmarks, demonstrating significant improvements in convergence speed over standard open-loop SQP.
View details
Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
Anthony G. Francis
Dmitry Kalashnikov
Edward Lee
Jake Varley
Leila Takayama
Mikael Persson
Peng Xu
Stephen Tu
Xuesu Xiao
Conference on Robot Learning (2022) (to appear)
Preview abstract
Despite decades of research, existing navigation systems still face real-world challenges when being deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints of Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers---a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves 40% better goal reached in cluttered environments and 65% better sociability when navigating around humans.
View details
Continuous Control and Multiscale Sensor Fusion with Neural CDEs
Francis Edward McCann Ramirez
Jake Varley
IROS & RSS Imitation Learning Workshop (2022)
Preview abstract
Even though robot learning is often formulated in terms of discrete-time Markov decision processes (MDPs), physical robots require near-continuous multiscale feedback control. Machines operate on multiple asynchronous sensing modalities each with different frequencies, e.g., video frames at 30Hz, proprioceptive state at 100Hz, force-torque data at 500Hz, etc. While the classic approach is to batch observations into fixed-time windows then pass them through feed-forward encoders (e.g., with deep networks), we show that there exists a more elegant approach -- one that treats policy learning as modeling latent state dynamics in continuous-time.
Specifically, we present 'InFuser', a unified architecture that trains continuous time-policies with Neural Controlled Differential Equations (CDEs). 'InFuser' evolves a single latent state representation over time by (In)tegrating and (Fus)ing multi-sensory observations (arriving at different frequencies), and inferring actions in continuous-time. This enables policies that can react to multi-frequency multi-sensory feedback for truly end-to-end visuomotor control, without discrete-time assumptions. Behavior cloning experiments demonstrate that 'InFuser' learns robust policies for dynamic tasks (e.g., swinging a ball into a cup) notably outperforming several baselines in settings where observations from one sensing modality can arrive at much sparser intervals than others.
View details
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Brian Ichter
Stefan Welker
Aveek Purohit
Michael Ryoo
arXiv (2022)
Preview abstract
Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g., spreadsheets, SAT questions, code). As a result, these models store different forms of commonsense knowledge across different domains. In this work, we show that this diversity is symbiotic, and can be leveraged through Socratic Models (SMs): a modular framework in which multiple pretrained models may be composed zero-shot i.e., via multimodal-informed prompting, to exchange information with each other and capture new multimodal capabilities, without requiring finetuning. With minimal engineering, SMs are not only competitive with state-of-the-art zero-shot image captioning and video-to-text retrieval, but also enable new applications such as (i) answering free-form questions about egocentric video, (ii) engaging in multimodal assistive dialogue with people (e.g., for cooking recipes) by interfacing with external APIs and databases (e.g., web search), and (iii) robot perception and planning. Prototypes are available at socraticmodels.github.io
View details
A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations
Sohan Rudra
Saksham Goel
Gaurav Aggarwal
NeurIPS 5th Robot Learning Workshop: Trustworthy Robotics (2022) (to appear)
Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks
Daniel Seita
Erwin Johan Coumans
Ken Goldberg
IEEE International Conference on Robotics and Automation (ICRA) (2021)
Preview abstract
Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as ``place the item inside the bag". In this work, we develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures, including tasks that involve image-based goal-conditioning and multi-step deformable manipulation. We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for robotic manipulation that uses learned template matching to infer displacements that can represent pick and place actions. We demonstrate that goal-conditioned Transporter Networks enable agents to manipulate deformable structures into flexibly specified configurations without test-time visual anchors for target locations. We also significantly extend prior results using Transporter Networks for manipulating deformable objects by testing on tasks with 2D and 3D deformables.
View details
Robotic Table Tennis with Model-Free Reinforcement Learning
Wenbo Gao
Navdeep Jaitly
International Conference on Intelligent Robots and Systems (IROS) (2020)
Preview abstract
We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80\% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies.
View details
Transporter Networks: Rearranging the Visual World for Robotic Manipulation
Stefan Welker
Jonathan Chien
Travis Armstrong
Ivan Krasin
Dan Duong
Conference on Robot Learning (CoRL) (2020)
Preview abstract
Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass object(s) or an end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input -- which can parameterize robot actions. It makes no assumptions of objectness (e.g. canonical poses, models, or keypoints), it exploits spatial symmetries, and is orders of magnitude more sample efficient than our benchmarked alternatives in learning vision-based manipulation tasks: from stacking a pyramid of blocks, to assembling kits with unseen objects; from manipulating deformable ropes, to pushing piles of small objects with closed-loop feedback. Our method can represent complex multi-modal policy distributions and generalizes to multi-step sequential tasks, as well as 6DoF pick-and-place. Experiments on 10 simulated tasks show that it learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses. We validate our methods with hardware in the real world.
View details
An Ode to an ODE
Jared Quincy Davis
Valerii Likhosherstov
Jean-Jacques Slotine
Jake Varley
Honglak Lee
Adrian Weller
NeurIPS 2020 (2020)
Preview abstract
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the field of matrix flows on compact manifolds.
View details
Unsupervised Anomaly Detection for Self-flying Delivery Drones
Hakim Sidahmed
Brandon Jones
International Conference on Robotics and Automation (2020)
Preview abstract
We propose a novel anomaly detection framework for a fleet of hybrid aerial vehicles executing high-speed package pickup and delivery missions. The detection is based on machine learning models of normal flight profiles, trained on millions of flight log measurements of control inputs and sensor readings. We develop a new scalable algorithm for robust regression which can simultaneously fit predictive flight dynamics models while identifying and discarding abnormal flight missions from
the training set. The resulting unsupervised estimator has a very high breakdown point and can withstand massive contamination of training data to uncover what normal flight patterns look like, without requiring any form of prior knowledge of aircraft aerodynamics or manual labeling of anomalies upfront. Across many different anomaly types, spanning simple 3-sigma statistical thresholds to turbulence and other equipment anomalies, our models achieve high detection rates across the board. Our method consistently outperforms alternative robust detection methods on synthetic benchmark problems. To the best of our knowledge, dynamics modeling of hybrid delivery drones for anomaly detection at the scale of 100 million measurements from 5000 real flight missions in variable flight
conditions is unprecedented.
View details
Stochastic Flows and Geometric Optimization on the Orthogonal Group
David Cheikhi
Jared Davis
Valerii Likhosherstov
Achille Nazaret
Achraf Bahamou
Xingyou Song
Mrugank Akarte
Jack Parker-Holder
Jacob Bergquist
Yuan Gao
Aldo Pacchiano
Adrian Weller
Thirty-seventh International Conference on Machine Learning (ICML 2020) (to appear)
Preview abstract
We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group O(d) and naturally reductive homogeneous manifolds obtained from the action of the rotation group SO(d). We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult Humanoid agent from OpenAI Gym and improving convolutional neural networks.
View details
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization
Aldo Pacchiano
Jack Parker-Holder
Yunhao Tang
Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019)
Preview abstract
We present a new algorithm ASEBO for optimizing high-dimensional blackbox functions. ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly. It addresses the exploration-exploitation trade-off of blackbox optimization with expensive blackbox queries by continuously learning the bias of the lower-dimensional model used to approximate gradients of smoothings of the function via compressed sensing and contextual bandits methods. To obtain this model, it leverages techniques from the emerging theory of active subspaces in the novel ES blackbox optimization context. As a result, ASEBO learns the dynamically changing intrinsic dimensionality of the gradient space and adapts to the hardness of different stages of the optimization without external supervision. Consequently, it leads to more sample-efficient blackbox optimization than state-of-the-art algorithms. We provide theoretical results and test ASEBO advantages over other methods empirically by evaluating it on the set of reinforcement learning policy optimization tasks as well as functions from the recently open-sourced Nevergrad library.
View details
Provably Robust Blackbox Optimization for Reinforcement Learning
Aldo Pacchiano
Jack Parker-Holder
Yunhao Tang
Yuxiang Yang
Conference on Robot Learning (CoRL 2019)
Preview abstract
Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics. However, it is well known that DFO methods suffer from prohibitively high sampling complexity. They can also be very sensitive to noisy rewards and stochastic dynamics. In this paper, we propose a new class of algorithms, called Robust Blackbox Optimization (RBO). Remarkably, even if up to 23% of all the measurements are arbitrarily corrupted, RBO can provably recover gradients to high accuracy. RBO relies on learning gradient flows using robust regression methods to enable off-policy updates. On several MuJoCo robot control tasks, when all other RL approaches collapse in the presence of adversarial noise, RBO is able to train policies effectively. We also show that RBO can be applied to legged locomotion tasks including path tracking for quadruped robots.
View details
Preview abstract
We propose a simple drop-in noise-tolerant replacement for the standard finite difference procedure used ubiquitously in blackbox optimization. In our approach, parameter perturbation directions are defined by a family of deterministic or randomized structured matrices. We show that at the small cost of computing a Fast Fourier Transform (FFT), such structured finite differences consistently give higher quality approximation of gradients and Jacobians in comparison to vanilla approaches that use coordinate directions or random Gaussian perturbations. We show that
linearization of noisy, blackbox dynamics using our methods leads to improved performance of trajectory optimizers like Iterative LQR and Differential Dynamic Programming on several classic continuous control tasks. By embedding structured exploration in implicit filtering methods, we are able to learn agile walking and turning policies for quadruped locomotion, that successfully transfer from simulation to actual hardware.
We give a theoretical justification of our methods in terms of bounds on the quality of gradient reconstruction in the presence of noise.
View details
Learning Stabilizable Dynamical Systems via Control Contraction Metrics
Sumeet Singh
Jean-Jacques Slotine
Marco Pavone
Workshop on Algorithmic Foundations of Robotics (WAFR-2018) (to appear)
Preview abstract
We propose a novel framework for learning stabilizable nonlinear dynamical systems for continuous control tasks in robotics. The key idea is to develop a new control-theoretic regularizer for dynamics fitting rooted in the notion
of stabilizability, which guarantees that the learned system can be accompanied
by a robust controller capable of stabilizing any open-loop trajectory that the system may generate. By leveraging tools from contraction theory, statistical learning, and convex optimization, we provide a general and tractable semi-supervised
algorithm to learn stabilizable dynamics, which can be applied to complex underactuated systems. We validated the proposed algorithm on a simulated planar quadrotor system and observed notably improved trajectory generation and
tracking performance with the control-theoretic regularized model over models
learned using traditional regression techniques, especially when using a small
number of demonstration examples. The results presented illustrate the need to
infuse standard model-based reinforcement learning algorithms with concepts
drawn from nonlinear control theory for improved reliability.
View details
Learning-based Air Data System for Safe and Efficient Control of Fixed-Wing Aerial Vehicles
Brandon Jones
Damien Jourdan
Maciej Chociej
Byron Boots
THE 16TH IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY, AND RESCUE ROBOTICS (SSRR-2018) (to appear)
Preview abstract
We develop an air data system for aerial robots executing high-speed outdoor missions subject to significant aerodynamic forces on their bodies. The system is based on a combination of Extended Kalman Filtering (EKF) and autoregressive feedforward Neural Networks, relying only on IMU sensors and GPS. This eliminates the need to instrument the vehicle with Pitot tubes and mechanical vanes, reducing associated cost, weight, maintenance requirements and likelihood of catastrophic mechanical failures. The system is trained to clone the behaviour of Pitot-tube measurements on thousands of instrumented simulated and real flights, and does not require a vehicle aerodynamics model. We demonstrate that safe guidance and navigation is possible in executing complex maneuvers in the presence of wind gusts without relying on airspeed sensors. We also demonstrate accuracy enhancements from successful “simulation-to-reality” transfer and dataset aggregation techniques to correct for training-test distribution mismatches when the air-data system and the control stack operate in closed loop.
View details
Policies Modulating Trajectory Generators
Erwin Coumans
2nd Annual Conference on Robot Learning, CoRL 2018, PMLR, pp. 916-926
Preview abstract
We propose an architecture for learning complex controllable behaviors by having simple Policies Modulate Trajectory Generators (PMTG), a powerful combination that can provide both memory and prior knowledge to the controller. The result is a flexible architecture that is applicable to a class of problems with periodic motion for which one has an insight into the class of trajectories that might lead to a desired behavior. We illustrate the basics of our architecture using a synthetic control problem, then go on to learn speed-controlled locomotion for a quadrupedal robot by using Deep Reinforcement Learning and Evolutionary Strategies. We demonstrate that a simple linear policy, when paired with a parametric Trajectory Generator for quadrupedal gaits, can induce walking behaviors with controllable speed from 4-dimensional IMU observations alone, and can be learned in under 1000 rollouts. We also transfer these policies to a real robot and show locomotion with controllable forward velocity.
View details
The Geometry of Random Features
Mark Rowland
Richard Turner
Adrian Weller
International Conference on Artificial Intelligence and Statistics (AISTATS) (2018)
Preview abstract
We present an in-depth examination of the effectiveness of radial basis function kernel (beyond
Gaussian) estimators based on orthogonal random feature maps. We show that orthogonal estimators
outperform state-of-the-art mechanisms that use iid sampling under weak conditions for tails of the
associated Fourier distributions. We prove that for the case of many dimensions, the superiority of the orthogonal transform over iid methods can be accurately measured by a property we define called the charm of the kernel, and that orthogonal random features provide optimal kernel estimators.
Furthermore, we provide the first theoretical results which explain why orthogonal random features outperform unstructured on downstream tasks such as kernel ridge regression by showing that orthogonal random features provide kernel algorithms with better spectral properties than the previous state-of-the-art. Our results enable practitioners more generally to estimate the benefits from applying orthogonal transforms.
View details
Preview abstract
We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees. We show that this algorithm can be successfully applied to learn better quality compact policies than those using standard gradient estimation techniques. The compact policies we learn have several advantages over unstructured ones, including faster training algorithms and faster inference. These benefits are important when the policy is deployed on real hardware with limited resources. Further, compact policies provide more scalable architectures for derivative-free optimization (DFO) in high-dimensional spaces. We show that most robotics tasks from the OpenAI Gym can be solved using neural networks with less than 300 parameters, with almost linear time complexity of the inference phase, with up to 13x fewer parameters relative to the Evolution Strategies (ES) algorithm introduced by Salimans et al. (2017). We do not need heuristics such as fitness shaping to learn good quality policies, resulting in a simple and theoretically motivated training mechanism.
View details
Preview abstract
We develop TROSS, a solver for constrained nonsmooth trajectory optimization based on a sequential operator splitting framework. TROSS iteratively improves trajectories by solving a sequence of subproblems setup within evolving trust regions around current iterates using the Alternating Direction Method of Multipliers (ADMM). TROSS achieves consensus among competing objectives, such as finding low-cost dynamically feasible trajectories respecting control limits and safety constraints. A library of building blocks in the form of inexpensive and parallelizable proximal operators associated with trajectory costs and constraints can be used to configure the solver for a variety of tasks. The method shows faster cost reduction compared to iterative Linear Quadratic Regulator (iLQR) and Sequential Quadratic Programming (SQP) on a control-limited vehicle maneuvering task. We demonstrate TROSS on shortest-path navigation of a variant of Dubin’s car in the presence of obstacles, while exploiting passive dynamics of the system. When applied to a constrained robust state estimation problem involving nondifferentiable nonconvex penalties, TROSS shows less susceptibility to non-Gaussian dynamics disturbances and measurement outliers in comparison to an Extended Kalman smoother. Unlike generic SQP methods, our approach produces time-varying linear feedback control policies even for constrained control tasks. The solver is potentially suitable for nonlinear model predictive control and moving horizon state estimation in embedded systems.
View details
Preview abstract
From a small number of calls to a given “blackbox" on random input perturbations,
we show how to efficiently recover its unknown Jacobian, or estimate the left action
of its Jacobian on a given vector. Our methods are based on a novel combination of
compressed sensing and graph coloring techniques, and provably exploit structural
prior knowledge about the Jacobian such as sparsity and symmetry while being
noise robust. We demonstrate efficient backpropagation through noisy blackbox
layers in a deep neural net, improved data-efficiency in the task of linearizing the
dynamics of a rigid body system, and the generic ability to handle a rich class of
input-output dependency structures in Jacobian estimation problems.
View details
Preview abstract
Motivated by applications in robotics and computer vision, we study problems related to spatial reasoning of a 3D environment using sublevel sets of polynomials. These include: tightly containing a cloud of points (e.g., representing an obstacle) with convex or nearly-convex basic semialgebraic sets, computation of Euclidean distances between two such sets, separation of two convex basic semalgebraic sets that overlap, and tight containment of the union of several basic semialgebraic sets with a single convex one. We use algebraic techniques from sum of squares optimization that reduce all these tasks to semidefinite programs of small size and present numerical experiments in realistic scenarios.
View details
Preview abstract
From a small number of calls to a given “blackbox"" on random input perturbations,
we show how to efficiently recover its unknown Jacobian, or estimate the left action
of its Jacobian on a given vector. Our methods are based on a novel combination of
compressed sensing and graph coloring techniques, and provably exploit structural
prior knowledge about the Jacobian such as sparsity and symmetry while being
noise robust. We demonstrate efficient backpropagation through noisy blackbox
layers in a deep neural net, improved data-efficiency in the task of linearizing the
dynamics of a rigid body system, and the generic ability to handle a rich class of
input-output dependency structures in Jacobian estimation problems.
View details
Manifold Regularization for Kernelized LSTD
Xinyan Yan
Byron Boots
1st Annual Conference on Robot Learning (CoRL 2017) (to appear)
Preview abstract
Policy evaluation or value function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore, its quality has a significant impact on most RL algorithms.
Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in action-state value function approximation.
Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.
View details
Preview abstract
Policy evaluation or value/Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.
View details
Learning Compact Recurrent Neural Networks
Preview
Zhiyun Lu
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
Recycling Randomness with Structure for Sublinear time Kernel Expansions
Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR.org, pp. 2502-2510
Preview abstract
We propose a scheme for recycling Gaussian random vectors into structured matrices to approximate
various kernel functions in sublinear time via random embeddings. Our framework includes the Fastfood construction of Le et al. (2013) as a special case, but also extends to Circulant, Toeplitz and Hankel matrices, and the broader family of structured matrices that are characterized by the concept of lowdisplacement rank. We introduce notions of coherence and graph-theoretic structural constants
that control the approximation quality, and prove unbiasedness and low-variance properties of random feature maps that arise within our framework.
For the case of low-displacement matrices, we show how the degree of structure and randomness can be controlled to reduce statistical variance at the cost of increased computation and storage requirements. Empirical results strongly support our theory and justify the use of a broader family of structured matrices for scaling up kernel methods using random features.
View details
Preview abstract
We consider the task of building compact deep learning pipelines suitable for deployment
on storage and power constrained mobile devices. We propose a uni-
fied framework to learn a broad family of structured parameter matrices that are
characterized by the notion of low displacement rank. Our structured transforms
admit fast function and gradient evaluation, and span a rich range of parameter
sharing configurations whose statistical modeling capacity can be explicitly tuned
along a continuum from structured to unstructured. Experimental results show
that these transforms can significantly accelerate inference and forward/backward
passes during training, and offer superior accuracy-compactness-speed tradeoffs
in comparison to a number of existing techniques. In keyword spotting applications
in mobile speech recognition, our methods are much more effective than
standard linear low-rank bottleneck layers and nearly retain the performance of
state of the art models, while providing more than 3.5-fold compression.
View details
No Results Found