Secure and Privacy-Preserving Deep Learning

ABOUT THE PROJECT

At a glance

Deep learning with neural networks has become a highly popular machine learning method due to recent breakthroughs in computer vision, speech recognition, and other areas. These recent successes are a direct result of the ability to train on large-scale data sets, from labeled photographs for object recognition to parallel texts for machine translation. While such data has largely come from public sources so far, private data aggregated from individuals would not only provide a boost to existing applications, but also enable new ones for deep learning. The increasing prevalence of autonomous vehicles also presents new opportunities for collecting and learning from enormous amounts of unstructured data, however, some types of private data, e-mails and vehicle location data for example, can be particularly sensitive. To convince individuals to allow deep learning based on such data, strong security and privacy guarantees must be provided.

Protecting privacy requires both preventing leakage of the training data and ensuring that the final model does not reveal private information. Given the opaque nature of deep neural networks, preventing any and all leaks is a significant challenge; existing systems developed for deep learning, such as Caffe, Torch, Theano, and TensorFlow, were not designed with security in mind.

This project will investigate a novel combination of techniques enabling secure, privacy-preserving deep learning. The team’s approach employs trusted hardware to provide end-to-end security for data collection, and uses differentially private deep learning algorithms to provide guaranteed privacy for individuals. The combination provides strong guarantees of both security and privacy: first, the original training data will not be revealed to any party, and second, the results of deep learning tasks will be differentially-private, and not reveal new information about any individual in the original training data. This combination also enables a high performance solution: unlike software-based solutions like secure multiparty computation (SMC), trusted hardware guarantees security while running at full speed. By guaranteeing security and privacy for individuals, this solution will enable the collection of enormous amounts of new data for deep learning purposes.

Principal investigators	researchers	themes
Dawn Song	Joe Near Richard Shin	Deep Learning Data Security