Difficulty of Nearest Neighbor Search
Abstract: Fast approximate nearest neighbor (NN) search in large
databases is becoming popular and several powerful learning-based formulations have
been proposed recently. However, not much attention has been paid to a more fundamental
question: how difficult is (approximate) nearest neighbor search in a given data set?
And which data properties affect the difficulty of nearest neighbor search and how?
This paper introduces the first concrete measure called Relative Contrast that can be
used to evaluate the influence of several crucial data characteristics such as
dimensionality, sparsity, and database size simultaneously in arbitrary normed metric
spaces. Moreover, we present a theoretical analysis to show how relative contrast
affects the complexity of Local Sensitive Hashing, a popular approximate NN search
method. Relative contrast also provides an explanation for a family of heuristic
hashing algorithms with good practical performance based on PCA. Finally, we show that
most of the previous works measuring meaningfulness or difficulty of NN search can be
derived as special asymptotic cases for dense vectors of the proposed measure.