Fast Simultaneous Object Detection and Segmentation


At a glance

Following dramatic advances in whole-image classification performance, convolutional networks have recently been applied to significantly advance the state-of-the-art in semantic segmentation, i.e., pixel labeling. These recent advances derive not from a new type of machinery, but from a reinterpretation and repurposing of powerful classification nets into fully convolutional networks (FCNs) which directly output pixels. Although FCNs are powerful labellers, pixel labeling is not sufficient to distinguish between adjacent objects of the same class. However, objects may still be distinguished in fully convolutional fashion with a powerful upgrade: instead of labeling pixels, pairs of pixels as belonging to the same object or different objects are labeled. With this upgrade, the team has been able to build an efficient convnet system for performing both segmentation and object detection in a fraction of a second. This makes the team’s system a natural choice for real-time perception, and the next step is to bring it to video.

In addition to simply evaluating the team’s recently developed method on video datasets, such as KITTI, video provides an opportunity to use motion cues to aid segmentation. Initial efforts in this direction show promise even a network trained only for classification can provide some high confidence cues about object segments simply by comparing features across frames. This information can be used both at test time, to improve output quality, and at train time, to provide an additional form of supervision without any additional labeling.
principal investigatorresearchersthemes
Trevor DarrellFisher Yu
Dequan Wang
Evan Shelhamer
Semantic Segmentation
Convolutional Networks