当前位置: X-MOL 学术J. Phys. Conf. Ser. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Graph sampling based deep metric learning for cross-view geo-localization
Journal of Physics: Conference Series Pub Date : 2024-02-01 , DOI: 10.1088/1742-6596/2711/1/012004
Haozhang Jia

Cross-view geo-localization has emerged as a novel computer vision task that has garnered increasing attention. This is primarily attributed to its practical significance in the domains of drone navigation and drone-view localization. Moreover, the work is particularly demanding due to its inherent requirement for cross-domain matching. There are generally two ways to train a neural network to match similar satellite and drone-view images: presentation learning with classifiers and identity loss, and metric learning with pairwise matching within mini-batches. The first takes extra computing and memory costs in large-scale learning, so this paper follows a person-reidentification method called QAConv-GS, and implements a graph sampler to mine the hardest data to form mini-batches, and a QAConv module with extra attention layers appended to compute similarity between image pairs. Batch-wise OHEM triplet loss is then used for model training. With these implementations and adaptions combined, this paper significantly improves the state of the art on the challenging University-1652 dataset.

中文翻译:

基于图采样的跨视图地理定位深度度量学习

跨视图地理定位已经作为一项新颖的计算机视觉任务出现,并引起了越来越多的关注。这主要归因于它在无人机导航和无人机视图定位领域的实际意义。此外,由于其固有的跨域匹配要求,这项工作要求特别高。通常有两种方法来训练神经网络来匹配类似的卫星和无人机视图图像:使用分类器和身份丢失进行表示学习,以及使用小批量内的成对匹配进行度量学习。第一种方法在大规模学习中需要额外的计算和内存成本,因此本文采用了一种名为 QAConv-GS 的人员重新识别方法,并实现了一个图形采样器来挖掘最难的数据以形成小批量,以及一个带有额外的 QAConv 模块附加注意层来计算图像对之间的相似性。然后使用批量 OHEM 三元组损失进行模型训练。通过将这些实现和调整相结合,本文显着提高了具有挑战性的 University-1652 数据集的最新技术水平。
更新日期:2024-02-01
down
wechat
bug