Self-driving by multi-objective reinforcement learning with goal-conditioned policies


At a glance

Classical reinforcement learning provides methods for discovering an optimal policy for a single objective. This approach is good enough for certain problems, but it contradicts the multiple-objective nature of many real-life scenarios, such as driving. In driving the objectives can change significantly over time. E.g. maybe today a car should drive on the left side, but in the next country it should drive on the right. Or maybe a car should strictly obey traffic rules now, but not if needing to rush to a hospital or to avoid an accident, etc... Handcrafting finite state machine or rules about what to do in each possible scenario is impractical. In this research project we propose to investigate multi-objective reinforcement learning for self-driving. It’s important to note we are *not* pursuing RL against a single, fixed weighted sum of objectives [that would just be standard RL], but rather RL that can generalize against future (not previously experienced) specification of multi-goal objectives. Hence the objective is to train artificial agents with the following behavior: (i) they learn a (optimal) policy that can be conditioned on any (weighted) subset of a very large vocabulary of goals, (ii) they have a high-level policy that learns to set goals for the lower-level goal-conditioned policy.

principal investigatorsresearchersthemes
Pieter Abbeel self-driving, reinforcement learning, deep neural networks, semantic segmentation, joint representation, multiple goal training