Image Retrieval with Deep Local Features and Attention-based Keypoints
Hyeonwoo Noh, Andre Araujo, Jack Sim, Bohyung Han
We introduce a local feature descriptor for large-scale image retrieval applications, called DELF (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained without object- and patch-level annotations on a landmark image dataset. To enhance DELF's image retrieval performance, we also propose an attention mechanism for keypoint selection, which shares most network layers with the descriptor. This new framework can be used in image retrieval as a drop-in replacement for other keypoint detectors and descriptors, enabling more accurate feature matching and geometric verification. Our technique is particularly useful for the large-scale setting, where a system must operate with high precision. In this case, our system produces reliable confidence scores to reject false positives effectively---in particular, our system is robust against queries that have no correct match in the database. We present an evaluation methodology for this challenging retrieval setting, using standard and large-scale datasets. We show that recently proposed methods do not perform well in this setup; DELF outperforms several recent global and local descriptors by substantial margins.