Training Deep Neural Networks on Noisy Labels with Bootstrapping
Venue
ICLR 2015
Publication Year
2015
Authors
Scott E. Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, Andrew Rabinovich
BibTeX
Abstract
Current state-of-the-art deep learning systems for visual object recognition and
detection use purely supervised training with regularization such as dropout to
avoid overfitting. The performance depends critically on the amount of labeled
examples, and in current practice the labels are assumed to be unambiguous and
accurate. However, this assumption often does not hold; e.g. in recognition, class
labels may be missing; in detection, objects in the image may not be localized; and
in general, the labeling may be subjective. In this work we propose a generic way
to handle noisy and incomplete labeling by augmenting the prediction objective with
a notion of consistency. We consider a prediction consistent if the same prediction
is made given similar percepts, where the notion of similarity is between deep
network features computed from the input data. In experiments we demonstrate that
our approach yields substantial robustness to label noise on several datasets. On
MNIST handwritten digits, we show that our model is robust to label corruption. On
the Toronto Face Database, we show that our model handles well the case of
subjective labels in emotion recognition, achieving state-of-theart results, and
can also benefit from unlabeled face images with no modification to our method. On
the ILSVRC2014 detection challenge data, we show that our approach extends to very
deep networks, high resolution images and structured outputs, and results in
improved scalable detection.
