Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Abstract
Feature selection is essential for effective visual recognition.
We propose an efficient joint classifier learning
and feature selection method that discovers sparse,
compact representations of input features from a vast
sea of candidates, with an almost unsupervised formulation.
Our method requires only the following knowledge,
which we call the feature sign—whether or not
a particular feature has on average stronger values over
positive samples than over negatives. We show how this
can be estimated using as few as a single labeled training
sample per class. Then, using these feature signs,
we extend an initial supervised learning problem into an
(almost) unsupervised clustering formulation that can
incorporate new data without requiring ground truth
labels. Our method works both as a feature selection
mechanism and as a fully competitive classifier. It has
important properties, low computational cost and excellent
accuracy, especially in difficult cases of very limited
training data. We experiment on large-scale recognition
in video and show superior speed and performance
to established feature selection approaches such
as AdaBoost, Lasso, greedy forward-backward selection,
and powerful classifiers such as SVM.