Publication Data
An Online Algorithm for Large Scale Image Similarity Learning
Abstract: Learning a measure of similarity between pairs of objects is
a fundamental problem in machine learning. It stands in the core of classifications
methods like kernel machines, and is particularly useful for applications like
searching for images that are similar to a given image or finding videos that are
relevant to a given video. In these tasks, users look for objects that are not only
visually similar but also semantically related to a given object. Unfortunately,
current approaches for learning similarity do not scale to large datasets, especially
when imposing metric constraints on the learned similarity. We describe OASIS, a method
for learning pairwise similarity that is fast and scales linearly with the number of
objects and the number of non-zero features. Scalability is achieved through online
learning of a bilinear model over sparse representations using a large margin criterion
and an efficient hinge loss cost. OASIS is accurate at a wide range of scales: on a
standard benchmark with thousands of images, it is more precise than state-of-the-art
methods, and faster by orders of magnitude. On 2 millions images collected from the
web, OASIS can be trained within 3 days on a single CPU. The non-metric similarities
learned by OASIS can be transformed into metric similarities, achieving higher
precisions than similarities that are learned as metrics in the first place. This
suggests an approach for learning a metric from data that is larger by two orders of
magnitude than was handled before.
