Learning deep representations for ground-to-aerial geolocalization
Research output: Contribution to journal › Conference article › Research › peer-review
Standard
Learning deep representations for ground-to-aerial geolocalization. / Lin, Tsung Yi; Cui, Yin; Belongie, Serge; Hays, James.
In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 14.10.2015, p. 5007-5015.Research output: Contribution to journal › Conference article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Learning deep representations for ground-to-aerial geolocalization
AU - Lin, Tsung Yi
AU - Cui, Yin
AU - Belongie, Serge
AU - Hays, James
N1 - Publisher Copyright: © 2015 IEEE.
PY - 2015/10/14
Y1 - 2015/10/14
N2 - The recent availability of geo-tagged images and rich geospatial data has inspired a number of algorithms for image based geolocalization. Most approaches predict the location of a query image by matching to ground-level images with known locations (e.g., street-view data). However, most of the Earth does not have ground-level reference photos available. Fortunately, more complete coverage is provided by oblique aerial or 'bird's eye' imagery. In this work, we localize a ground-level query image by matching it to a reference database of aerial imagery. We use publicly available data to build a dataset of 78K aligned crossview image pairs. The primary challenge for this task is that traditional computer vision approaches cannot handle the wide baseline and appearance variation of these cross-view pairs. We use our dataset to learn a feature representation in which matching views are near one another and mismatched views are far apart. Our proposed approach, Where-CNN, is inspired by deep learning success in face verification and achieves significant improvements over traditional hand-crafted features and existing deep features learned from other large-scale databases. We show the effectiveness of Where-CNN in finding matches between street view and aerial view imagery and demonstrate the ability of our learned features to generalize to novel locations.
AB - The recent availability of geo-tagged images and rich geospatial data has inspired a number of algorithms for image based geolocalization. Most approaches predict the location of a query image by matching to ground-level images with known locations (e.g., street-view data). However, most of the Earth does not have ground-level reference photos available. Fortunately, more complete coverage is provided by oblique aerial or 'bird's eye' imagery. In this work, we localize a ground-level query image by matching it to a reference database of aerial imagery. We use publicly available data to build a dataset of 78K aligned crossview image pairs. The primary challenge for this task is that traditional computer vision approaches cannot handle the wide baseline and appearance variation of these cross-view pairs. We use our dataset to learn a feature representation in which matching views are near one another and mismatched views are far apart. Our proposed approach, Where-CNN, is inspired by deep learning success in face verification and achieves significant improvements over traditional hand-crafted features and existing deep features learned from other large-scale databases. We show the effectiveness of Where-CNN in finding matches between street view and aerial view imagery and demonstrate the ability of our learned features to generalize to novel locations.
UR - http://www.scopus.com/inward/record.url?scp=84959245070&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2015.7299135
DO - 10.1109/CVPR.2015.7299135
M3 - Conference article
AN - SCOPUS:84959245070
SP - 5007
EP - 5015
JO - I E E E Conference on Computer Vision and Pattern Recognition. Proceedings
JF - I E E E Conference on Computer Vision and Pattern Recognition. Proceedings
SN - 1063-6919
T2 - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Y2 - 7 June 2015 through 12 June 2015
ER -
ID: 301829041