Banding for Large-Scale Retrieval with Memory and Recall Constraints
Abstract: Locality Sensitive Hashing (LSH) is widely used for
efficient retrieval of candidate matches in very large audio, video, and image systems.
However, extremely large reference databases necessitate a guaranteed limit on the
memory used by the table lookup itself, no matter how the entries crowd different parts
of the signature space, a guarantee that LSH does not give. In this paper, we provide
such guaranteed limits, primarily through the design of the LSH bands. When combined
with data-adaptive bin splitting (needed on only 0.04% of the occupied bins), this
approach provides the required guarantee on memory usage. At the same time, it avoids
the reduced recall that more extensive use of bin splitting would give.