Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
Venue
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Washington, DC, USA (2013)
Publication Year
2013
Authors
Thomas Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik
BibTeX
Abstract
Many object detection systems are constrained by the time required to convolve a
target image with a bank of filters that code for different aspects of an object's
appearance, such as the presence of component parts. We exploit locality-sensitive
hashing to replace the dot-product kernel operator in the convolution with a fixed
number of hash-table probes that effectively sample all of the filter responses in
time independent of the size of the filter bank. To show the effectiveness of the
technique, we apply it to evaluate 100,000 deformable-part models requiring over a
million (part) filters on multiple scales of a target image in less than 20 seconds
using a single multi-core processor with 20GB of RAM. This represents a speed-up of
approximately 20,000 times - four orders of magnitude - when compared with
performing the convolutions explicitly on the same hardware. While mean average
precision over the full set of 100,000 object classes is around 0.16 due in large
part to the challenges in gathering training data and collecting ground truth for
so many classes, we achieve a mAP of at least 0.20 on a third of the classes and
0.30 or better on about 20% of the classes.
