Guaranteed Adaptation and Exploration in Uncertain Environments


At a glance

Guaranteed robust execution and safety of machine learning for autonomous driving remain signicant hurdles to clear before such systems can be deployed at scale. Failure of driving systems would have severe social and economic consequences including the likely loss of human life. How can we guarantee that our new data-driven automated systems are robust? 
In this project, we propose to expand a new paradigm of data-driven control called ?Coarse-ID Control? to the setting where we must learn from a changing and dynamic environment. We will link classic work in adaptive control and model predictive control to SLS tools. The key will be to explicitly account for the uncertainty due to the learning process in control design. We aim to show systems designed in this manner achieve consistent, safe and predictable behavior even in uncertain and new environments.
Technical Approach
Coarse-ID control.
In prior work funded by BDD, we established Coarse-ID control, a general frame-work for data-driven control consisting of the following three steps:

  1. Use supervised learning to learn a coarse model of the dynamical system to be controlled. We refer to the system estimate as the nominal system.
  2. Using either prior knowledge or statistical tools like the bootstrap, build probabilistic guarantees about the distance between the nominal system and the true, unknown dynamics.
  3. Solve a robust optimization problem that optimizes control of the nominal system while penalizing signals with respect to the estimated uncertainty, ensuring stable, robust execution.

This procedure is well illustrated through the case study of the Linear Quadratic Regulator (LQR). This core problem in optimal control seeks to minimize a quadratic cost (such as the distance from a trajectory or the total jerk of a motion) subject to linear dynamics. The LQR problem with unknown dynamics is, in some sense, the simplest reinforcement learning problem with continuous variables and can be used as a baseline for distinguishing strengths and weaknesses of various methods. Our Coarse-ID paradigm proceeds as follows in this setting: rst we can estimate and approximate dynamical model, ^ϑ, by solving a least-squares problem. We can run the bootstrap to guarantee that the true model, ϑ★ is
close to ^ϑ in the sense that ∆ϑ = ^ϑ - ϑ★ has small norm with high probability. In light of this, we can pose a robust variant of the standard LQR optimal control problem, computing a robust controller that minimizes the worst-case performance of the system given the (high-probability) norm bounds on ∆ϑ. 
We leveraged the recently developed System Level Synthesis (SLS) framework [3] to solve this robust optimization problem. SLS lifts the system description into a higher dimensional space that enables effcient search for controllers. At the cost of some conservatism, we could guarantee robust stability of the resulting closed loop system for all admissible perturbations and furnish the rst non-asymptotic bounds on LQR [2]. In particular, our method guarantees safe execution in a variety of regimes where non-robust techniques and reinforcement learning methods fail.
Adaptive control and environmental uncertainty. The first ingredient to handling environmental uncertainty is a rigorous framework for adaptive control that enables systems to incorporate new observations into coarse models to refine constraint sets and costs. In this project, we will investigate how to design such a framework using techniques established in Coarse-ID control. 
The first step will be to extend our work on LQR to the adaptive setting, and we propose a hybrid of SLS and model predictive control (MPC). In (MPC), control problems are approximately solved on finite time horizons, one step is taken, and then this process is repeated [1]. We will extend Coarse-ID Control to MPC, with particular attention to avoiding constraint violation when the dynamics are approximate.

Figure 1. A Coarse-ID Control block diagram (left). Coarse-ID Control achieves comparable performance
to naive methods, but is consistently stabilizing whereas non-robust controllers are not (right).

For adaptive control, we will investigate a simplied version without state or input constraints as a subroutine in an adaptive control algorithm, wherein after a sucient amount of data is collected about the system the system model and control policy is updated. The performance loss can be quantied as a function of system uncertainty size using similar analysis techniques as those used in the Learning LQR paper [2]. This work will also require the development of novel statistical bounds on learning rates from highly non-independent time series data, and we intend to actively pursuing such bounds in this work. By integrating such new statistical developments, we expect to be able to guarantee of stability and performance for such adaptive control systems, while requiring only veriable and practically relevant assumptions on the system model.
Safe exploration in autonomous driving cars. We will apply these techniques to adaptive exploration in autonomous cars that aim to stay in safe regions of state-space (such as on a road) while executing agile maneuvers [4]. Using an MPC-based platform developed by the Borrelli lab, we will use SLS to learn nonlinear dynamical models as an autonomous car pushes against the envelope of known safe behavior. The initial autonomous setup will use inertial sensors and pre-programmed knowledge of lane curvature, but later we aim to incorporate vision as well. Starting with standard lane detection, the vision system will build rened estimates through data gathered in autonomous execution.
Beyond rening the lane estimates, we will also evaluate their accuracy to build a model of uncertainty in the lane detection system. Instead of directly feeding the lane position into the tracking controller, we will input both the detected lanes and the uncertainty in the detection. Using this enriched information, the controller will act robustly to prevent bad behavior in the case of a malfunctioning lane detection.

principal investigatorsresearchersthemes
Ben Recht