On the Difficulty of Nearest Neighbor Search
Venue
International Conference on Machine Learning (ICML) (2012)
Publication Year
2012
Authors
Junfeng He, Sanjiv Kumar, Shih-Fu Chang
BibTeX
Abstract
Fast approximate nearest neighbor (NN) search in large databases is becoming
popular and several powerful learning-based formulations have been proposed
recently. However, not much attention has been paid to a more fundamental question:
how difficult is (approximate) nearest neighbor search in a given data set? And
which data properties affect the difficulty of nearest neighbor search and how?
This paper introduces the first concrete measure called Relative Contrast that can
be used to evaluate the influence of several crucial data characteristics such as
dimensionality, sparsity, and database size simultaneously in arbitrary normed
metric spaces. Moreover, we present a theoretical analysis to show how relative
contrast affects the complexity of Local Sensitive Hashing, a popular approximate
NN search method. Relative contrast also provides an explanation for a family of
heuristic hashing algorithms with good practical performance based on PCA. Finally,
we show that most of the previous works measuring meaningfulness or difficulty of
NN search can be derived as special asymptotic cases for dense vectors of the
proposed measure.
