Permutation Indexing: Fast Approximate Retrieval from Large Corpora
Venue
22nd International Conference on Information and Knowledge Management (CIKM), ACM (2013)
Publication Year
2013
Authors
BibTeX
Abstract
Inverted indexing is a ubiquitous technique used in retrieval systems including web
search. Despite its popularity, it has a drawback - query retrieval time is highly
variable and grows with the corpus size. In this work we propose an alternative
technique, permutation indexing, where retrieval cost is strictly bounded and has
only logarithmic dependence on the corpus size. Our approach is based on two novel
techniques:
- partitioning of the term space into overlapping clusters of terms that frequently co-occur in queries, and
- a data structure for compactly encoding results of all queries composed of terms in a cluster as continuous sequences of document ids.
