On Learning Where to Look
Venue
Google Inc. (2014)
Publication Year
2014
Authors
Marc'Aurelio Ranzato
BibTeX
Abstract
Current automatic vision systems face two major challenges: scalability and extreme
variability of appearance. First, the computational time required to process an
image typically scales linearly with the number of pixels in the image, therefore
limiting the resolution of input images to thumbnail size. Second, variability in
appearance and pose of the objects constitute a major hurdle for robust recognition
and detection. In this work, we propose a model that makes baby steps towards
addressing these challenges. We describe a learning based method that recognizes
objects through a series of glimpses. This system performs an amount of computation
that scales with the complexity of the input rather than its number of pixels.
Moreover, the proposed method is potentially more robust to changes in appearance
since its parameters are learned in a data driven manner. Preliminary experiments
on a handwritten dataset of digits demonstrate the computational advantages of this
approach.
