Systematic Study of Neural Network Robustness Against Adversarial Attacks


At a glance

Deep Neural Networks are quite vulnerable to adversarial perturbations. Despite significant efforts in developing defense mechanisms, this challenge has remained unresolved. To this date, all of the proposed defenses have been broken by simple attacks. The issue of adversarial attacks is particularly critical for autonomous driving. One important problem is that there is very little understanding about the source of vulnerability of NNs to adversarial attacks, its dependence on the training dataset, model architecture, and the trade-off between robustness and accuracy. We plan to address this by performing a systematic study of adversarial robustness by considering (i) the impact of the training dataset on the resulting decision boundary of the model on robustness, (ii) the impact of the recent robust optimization methods proposed on the immunity of the model in the presence of non-robust features in the dataset, and (iii) the trade-off between accuracy and robustness. First, it has been shown that certain features in input data are important sources of vulnerability in NNs. In particular, it has been observed that NNs tend to overfit to non-robust features of input data. We plan to design automated methods to detect whether a trained NN has overfitted to such non-robust features in the data. Second, we will evaluate the impact of the recent advances in robust optimization, in the presence of different kinds of non-robust features in the input dataset. In particular, we will focus on novel techniques such as Mix-Up and adversarial training methods. Third, we will perform a systematic study of the trade-off between model accuracy and robustness, by specifically focus on object detection and segmentation problems, for which there is very little work on training based defense methods.



Kurt Keutzer
Joseph Gonzalez
Michael Mahoney

Amir Gholami
Yaoqing Yang

Adversarial Defense, Robustness, Adversarial Attack, Non-Robust Features