SSD: Single Shot MultiBox Detector
Venue
Proceedings of the European Conference on Computer Vision (ECCV) (2016) (to appear)
Publication Year
2016
Authors
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,, Cheng-Yang Fu,, Alexander C. Berg
BibTeX
Abstract
We present a method for detecting objects in images using a single deep neural
network. Our approach, named SSD, discretizes the output space of bounding boxes
into a set of bounding box priors over different aspect ratios and scales per
feature map location. At prediction time, the network generates confidences that
each prior corresponds to objects of interest and produces adjustments to the prior
to better match the object shape. Additionally, the network combines predictions
from multiple feature maps with different resolutions to naturally handle objects
of various sizes. Our SSD model is simple relative to methods that requires object
proposals, such as R-CNN and MultiBox, because it completely discards the proposal
generation step and encapsulates all the computation in a single network. This
makes SSD easy to train and straightforward to integrate into systems that require
a detection component. Experimental results on ILSVRC DET and PASCAL VOC dataset
confirm that SSD has comparable performance with methods that utilize an additional
object proposal step and yet is 100-1000x faster. Compared to other single stage
methods, SSD has similar or better performance, while providing a unified framework
for both training and inference.
