Learning Human-Like Driving Behavior via Augmented Reward and Data

ABOUT THE PROJECT

At a glance

An autonomous vehicle (AV) behaving unlike a human-driven vehicle (HRV) often leads to undesirable hazards. We propose a methodology to enable RL to learn human-like driving behaviors through the use of an augmented reward function. One reward term is learned from demonstrations with augmented data. The augmented dataset will include infrequent driving cases such as recovering from off-road driving or near crash avoidance driving. The other reward term is semantically designed with simplified representations of evaluation metrics of driving behaviors. The first term can be a surrogate reward from imitation learning where a behavior estimator is obtained from data instead of a set of actions (i.e. the policy) as in standard IL. An alternative approach will also be tried to use adversarial IRL for learning the reward function directly from data and then apply it in RL. Furthermore, we will also apply driving safety constraints in the learning procedure through the use of constrained optimization methods, e.g. Constrained Policy Optimization (CPO), to ensure safe driving behaviors.

PRINCIPAL INVESTIGATORSRESEARCHERSTHEMES

Pin Wang
Ching-Yao Chan
Sergey Levine

 

Human-Like Driving Behavior, Reinforcement Learning, Imitation Learning, Inverse Reinforcement Learning, Augmented Reward and Data