DeepStereo: Learning to Predict New Views From the World's Imagery
Venue
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Publication Year
2016
Authors
John Flynn, Ivan Neulander, James Philbin, Noah Snavely
BibTeX
Abstract
Deep networks have recently enjoyed enormous success when applied to recognition
and classification problems in computer vision [22, 32], but their use in graphics
problems has been limited ([23, 7] are notable recent exceptions). In this work, we
present a novel deep architecture that per- forms new view synthesis directly from
pixels, trained from a large number of posed image sets. In contrast to tradi-
tional approaches which consist of multiple complex stages of processing, each of
which require careful tuning and can fail in unexpected ways, our system is trained
end-to-end. The pixels from neighboring views of a scene are presented to the
network which then directly produces the pixels of the unseen view. The benefits of
our approach include gen- erality (we only require posed image sets and can easily
apply our method to different domains), and high quality results on traditionally
difficult scenes. We believe this is due to the end-to-end nature of our system
which is able to plausibly generate pixels according to color, depth, and tex- ture
priors learnt automatically from the training data. We show view interpolation
results on imagery from the KITTI dataset [12], from data from [1] as well as on
StreetView images. To our knowledge, our work is the first to apply deep learning
to the problem of new view synthesis from sets of real-world, natural imagery.
