Learning in repeated Stackelberg interactions with partial feedback in autonomous driving


At a glance

The ability to plan to react to future events is central to many sequential decision making problems that humans solve with relative ease but remain challenging for AI systems.
Effectively driving a car, for example, generally requires not only predicting how other cars will move and interact with each other, but also the ability to plan to react to future events. This key property distinguishes closed-loop planning from open-loop planning:
a closed-loop planner generates a set of conditional actions that depend on the stochastic outcome of future events, i.e., the planner takes into account the ability to react to future observations by planning policies.
In contrast, an open-loop planner generates a fixed sequence of future actions and therefore does not account for the ability to react to new observations in the future (by selecting different actions).
In many scenarios, it is clearly insufficient to simply plan actions based on multiple predicted outcomes, instead, an agent must plan to react to different outcomes:
for example, a self-driving car attempting to cross a busy intersection that must commit to a planned sequence of actions ahead of time (i.e., using an open-loop planner) will be overly conservative and potentially become frozen, since it cannot plan to adjust its future behavior based on unpredictable future events (such as whether an oncoming vehicle turns left or goes straight).

Despite the fundamental theoretical difference between the open- and closed-loop planning, their trade-offs are often far less clear in practice.
Open-loop planning methods generally trade optimality for computational efficiency, since planning optimal action sequences is far more tractable than planning optimal policies. However, open-loop planners are usually used in systems where plans are refreshed at a fast rate (open-loop feedback control).
Fast re-planning accommodates model prediction errors and enables the system to quickly react to new observations, regardless of the inability to plan to react to future observations. On the other hand, recent closed-loop planning approaches utilize learned models and policies, making them significantly more efficient and scalable than traditional approaches that rely on dynamic programming and MDP solvers, and more comparable to their open-loop counterparts.
To the best of our knowledge, there has not been a comprehensive evaluation of modern planning methods (e.g., those used in AV systems) that compares open- vs closed-loop planning.
In this project, we aim to both empirically and theoretically analyze the differences between open- and closed-loop planning, which will both illuminate the failure modes of existing planning approaches, as well as inform the development of new theoretically-motivated planning algorithms that balance efficiency and optimality.

principal investigatorsresearchersthemes

Jiantao Jiao

Joseph Gonzalez

Tianhao Wu

Charles Packer

Tianjun Zhang

Closed-loop and open-loop planning, autonomous driving,  perception, prediction, theory, robotics