Model-Free Reinforcement Learning Through the Optimization Lens


At a glance

Recently there has been renewed interest in model-free reinforcement learning (MFRL) for training agents to perform tasks in simulated settings. However, the difficulty in training these models and their brittleness in the real world calls into question their utility for actual deployment in autonomous transportation systems. Recent studies demonstrate that these methods are not even robust to changes in the random seeds [2]. Algorithms with such fragilities cannot be integrated into autonomous driving platforms without significant simplification and robustification.

We propose to fix these fragilities using tools from nonlinear optimization. Indeed, MFRL can be naturally posed as a problem of derivative-free optimization. Most MFRL problems seek to find a policy that maximizes an expected reward. The difficulty in optimizing the parameters of the policy arises because of the inability to compute gradients. In continuous control problems that occur in autonomous driving, however, this problem is a smooth, nonlinear optimization problem. Hence, we can use techniques from smooth, derivative-free optimization to develop new strategies for reinforcement learning.

Benjamin Recht Reinforcement Learning, Derivative Free Optimization, Robustness