Optimizing Simulations with Noise-Tolerant Structured Exploration
We propose a simple drop-in noise-tolerant replacement for the standard finite difference procedure used ubiquitously in blackbox optimization. In our approach, parameter perturbation directions are defined by a family of deterministic or randomized structured matrices. We show that at the small cost of computing a Fast Fourier Transform (FFT), such structured finite differences consistently give higher quality approximation of gradients and Jacobians in comparison to vanilla approaches that use coordinate directions or random Gaussian perturbations. We show that linearization of noisy, blackbox dynamics using our methods leads to improved performance of trajectory optimizers like Iterative LQR and Differential Dynamic Programming on several classic continuous control tasks. By embedding structured exploration in implicit filtering methods, we are able to learn agile walking and turning policies for quadruped locomotion, that successfully transfer from simulation to actual hardware. We give a theoretical justification of our methods in terms of bounds on the quality of gradient reconstruction in the presence of noise.