Realistic Evaluation of Semi-Supervised Learning Algorithms
Augustus Odena, Avital Oliver, Colin Raffel, Ekin Dogus Cubuk, Ian Goodfellow
Semi-supervised learning (SSL) provides a powerful framework for leveraging
unlabeled data when labels are limited or expensive to obtain. Approaches based on
deep neural networks have recently proven successful on standard benchmark tasks.
However, we argue that these benchmarks do not reflect real-world requirements and
are compared to weak baselines. We propose a set of new benchmarks and find that
simple baselines that were previously underappreciated outperform more complicated
research ideas that were previously regarded as state of the art. Using our new
benchmarking procedures, we additionally find that SSL methods are highly sensitive
to the amount of unlabeled data and the class distribution of the data. We
encourage researchers studying SSL to adopt our improved methodology, and suggest
readers and reviewers of SSL papers to familiarize themselves with the experimental
design concerns we identify.