In-Car AI Assistant: Efficient End-to-End Conversational AI System


At a glance

Recent advances in Deep Neural Network (DNN) based conversational AI systems have significantly improved smart assistants. These systems have created a seamless user- experience, and are now widely adopted in mobile phones and smart speakers. These successes motivate the application of dialogue systems to more safety-critical use cases such as cars, where it is crucial that the driver be able to stay focused on the road and not have to deal with the complex infotainment interfaces. However, there are many barriers in deploying an accurate AI Assistant in cars. First, the current technology deployed by smart speakers requires constant cloud connection. This is due to the high computational costs required to process a user’s audio signal, understand a user’s intent or command, manage dialogue state, and generate human-like speech. Another important challenge is the large model sizes in the current state- of-the-art speech and language processing tasks, which use the transformer models. These DNNs have a prohibitive memory for them to be deployed within the memory budget of the edge. To address these, we plan to pursue a multi-faceted approach to design an end-to-end system that can be efficiently deployed at the edge, within the hardware budget of typical autonomous driving cars. In particular, we will pursue the following directions. First, we will design and train an end-to-end model that receives user’s acoustic data as input, and performs an integrated Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Dialogue State Management (DM), Natural Language Generation (NLG) and Text-to-Speech (TTS) in one-pass. This can lead to orders of magnitude reduction in model footprint and improved latency, as compared to conventional modular- based solutions. We will further optimize the previous prototype, through a differential Neural Architecture followed by compres- sion (quantization, hierarchical pruning, and distillation). If it is of interest to sponsors, we can also investigate co-design focused on a particular processor target.

principal investigatorsresearchersthemes
Kurt Keutzer

Amir Gholami

Nicholas Lee

Sheng Shen

Jiachen Lian

Sehoon Kim

Automatic Speech Recognition, Natural Language Understanding, Natural Language Generation, Text-to-Speech, Neural Architecture Search, AI at the Edge, Embedded AI

This project builds upon the work of "Embedded In-Car AI Assistant: Efficient End-to-End Speech Recognition and Natural Language Understanding for Command Recognition at the Edge".