Enabling Valuable Training Data through Domain Adaptation

ABOUT THE PROJECT

At a glance

Training Deep Neural Networks (DNN) requires large labeled datasets, which are expensive and time-consuming to create. Training on synthetic simulation data with automatically generated annotations, rather than real data, as described in Prof. Seshia’s proposal entitled Safe and Effective Learning and Autonomy through Formal Simulation, obviates the need for time-consuming labeling. However, due to the dataset bias or domain shift [Tzeng2015], models learned from synthetic data cannot be reliably generalized to create real training data [Shrivastava2017]. For example, the overall per-pixel label accuracy of a state-of-the-art semantic segmentation model drops from 93% (if trained on real imagery) to 54% (if trained only on synthetic data) [Hoffman2018]. How to effectively adapt from synthetic domains to real-world data remains an open problem.

principal investigators	researchers	themes
Kurt Keutzer	Bichen Wu Xiangyu Yue Sicheng Zhao	Domain adaptation, Sim2Real, LiDAR-based perception