Using Web Co-occurrence Statistics for Improving Image Categorization
Venue
arXiv (2013)
Publication Year
2013
Authors
Samy Bengio, Jeffrey Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer
BibTeX
Abstract
Object recognition and localization are important tasks in computer vision. The
focus of this work is the incorporation of contextual information in order to
improve object recognition and localization. For instance, it is natural to expect
not to see an elephant to appear in the middle of an ocean. We consider a simple
approach to encapsulate such common sense knowledge using co-occurrence statistics
from web documents. By merely counting the number of times nouns (such as
elephants, sharks, oceans, etc.) co-occur in web documents, we obtain a good
estimate of expected co-occurrences in visual data. We then cast the problem of
combining textual co-occurrence statistics with the predictions of image-based
classifiers as an optimization problem. The resulting optimization problem serves
as a surrogate for our inference procedure. Albeit the simplicity of the resulting
optimization problem, it is effective in improving both recognition and
localization accuracy. Concretely, we observe significant improvements in
recognition and localization rates for both ImageNet Detection 2012 and Sun 2012
datasets.
