Learning Human-Like Driving Behavior via Augmented Reward and Data


At a glance

An autonomous vehicle (AV) behaving unlike a human-driven vehicle (HRV) often leads to undesirable hazards. We propose a methodology to enable RL to learn human-like driving behaviors through the use of an augmented reward function. One reward term is learned from demonstrations with augmented data. The augmented dataset will include infrequent driving cases such as recovering from off-road driving or near crash avoidance driving. The other reward term is semantically designed with simplified representations of evaluation metrics of driving behaviors. The first term can be a surrogate reward from imitation learning where a behavior estimator is obtained from data instead of a set of actions (i.e. the policy) as in standard IL. An alternative approach will also be tried to use adversarial IRL for learning the reward function directly from data and then apply it in RL. Furthermore, we will also apply driving safety constraints in the learning procedure through the use of constrained optimization methods, e.g. Constrained Policy Optimization (CPO), to ensure safe driving behaviors.


Pin Wang
Ching-Yao Chan
Sergey Levine


Human-Like Driving Behavior, Reinforcement Learning, Imitation Learning, Inverse Reinforcement Learning, Augmented Reward and Data