Discrete Point Based Signatures and Applications to Document Matching
Nemanja Spasojevic, Guillaume Poncin, Dan Bloomberg
Document analysis often starts with robust signatures, for instance for document
lookup from low-quality photographs, or similarity analysis between scanned books.
Signatures based on OCR typically work well, but require good quality OCR, which is
not always available and can be very costly. In this paper we describe a novel
scheme for extracting discrete signatures from document images. It operates on
points that describe the position of words, typically the centroid. Each point is
extracted using one of several techniques and assigned a signature based on its
relation to the nearest neighbors. We will discuss the benefits of this approach,
and demonstrate its application to multiple problems including fast image
similarity calculation and document lookup.