Reading Digits in Natural Images with Unsupervised Feature Learning
Venue
NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011
Publication Year
2011
Authors
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
BibTeX
Abstract
Detecting and reading text from natural images is a hard computer vision task that
is central to a variety of emerging applications. Related problems like document
character recognition have been widely studied by computer vision and machine
learning researchers and are virtually solved for practical applications like
reading handwritten digits. Reliably recognizing characters in more complex scenes
like photographs, however, is far more difficult: the best existing methods lag well
behind human performance on the same tasks. In this paper we attack the problem of
recognizing digits in a real application using unsupervised feature learning
methods: reading house numbers from street level photos. To this end, we introduce
a new benchmark dataset for research use containing over 600,000 labeled digits
cropped from Street View images. We then demonstrate the difficulty of recognizing
these digits when the problem is approached with hand-designed features. Finally,
we employ variants of two recently proposed unsupervised feature learning methods
and find that they are convincingly superior on our benchmarks.
