Episode 2: (Neural)Control

June 11 1:30-5:30 PM EDT

 

Full Stack GPU Training and An Architecture of RL Policy with Optimization-Based Control

We provide a full-stack GPU training pipeline that reduces the training time by mitigating the data transfer between GPU and CPU. The capability of massive parallelization suites well for the environment includes an optimization-based controller since the optimization is typically slow. For this setup, the training speed is a few magnitudes faster than the CPU-based RL. In our task, the policy for stepping stones with various speeds and contact sequences converges in around 30 minutes. Moreover, we show an instant QP + RL policy has a similar performance to NMPC but runs in a fraction of time.  

Dr. Zhaoming Xie

ZhaomingXie.jpeg

Zhaoming Xie is a final year PhD student in computer science at the University of British Columbia working with Michiel van de Panne. His research interests mainly lie in applying reinforcement learning to legged robots, with success in bipeds and quadrupeds.