PlaNet - Photo Geolocation with Convolutional Neural Networks
Venue
European Conference on Computer Vision (ECCV) (2016) (to appear)
Publication Year
2016
Authors
Tobias Weyand, Ilya Kostrikov, James Philbin
BibTeX
Abstract
Is it possible to determine the location of a photo from just its pixels? While the
general problem seems exceptionally difficult, photos often contain cues such as
landmarks, weather patterns, vegetation, road markings, and architectural details,
which in combination allow to infer the location. In computer vision, this problem
is usually approached using image retrieval methods. In contrast, we pose the
problem as one of classification by subdividing the surface of the earth into
thousands of multi-scale geographic cells, and train a deep network using millions
of geotagged images. We show that the resulting model, called PlaNet, outperforms
previous approaches and even attains superhuman accuracy in some cases. Moreover,
we extend our model to photo albums by combining it with a long short-term memory
(LSTM) architecture. By learning to exploit temporal coherence to geolocate
uncertain photos, this model achieves a 50% performance improvement over the
single-image model.
