Reporting Neighbors in High-Dimensional Euclidean Space
Venue
SIAM journal of computing, vol. 43 (2014), pp. 1239-1511
Publication Year
2014
Authors
Dror Aiger, Haim Kaplan, Micha Sharir
BibTeX
Abstract
We consider the following problem, which arises in many database and web-based
applications: Given a set P of n points in a high-dimensional space Rd and a
distance r, we want to report all pairs of points of P at Euclidean distance at
most r. We present two randomized algorithms, one based on randomly shifted grids,
and the other on randomly shifted and rotated grids. The running time of both
algorithms is of the form C(d)(n + k)log n, where k is the output size and C(d) is
a constant that depends on the dimension d. The log n factor is needed to
guarantee, with high probability, that all neighbor pairs are reported, and can be
dropped if it suffices to report, in expectation, an arbitrarily large fraction of
the pairs. When only translations are used, C(d) is of the form (a√d)d, for some
(small) absolute constant a≈0.484; this bound is worst-case tight, up to an
exponential factor of about 2d. When both rotations and translations are used, C(d)
can be improved to roughly 6.74d, getting rid of the super-exponential factor √dd.
When the input set (lies in a subset of d-space that) has low doubling dimension
,the performance of the first algorithm improves to C(d,δ)(n + k)log n (or to
C(d,δ)(n + k)), where C(d,δ)=O((ed/δ),δ), for δ≤ √d. Otherwise, (d,δ)=O(e√d√dδ. We
also present experimental results on several large datasets, demonstrating that our
algorithms run significantly faster than all the leading existing algorithms for
reporting neighbors.
