当前位置: X-MOL 学术IEEE Robot. Automation Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust Visual Place Recognition for Severe Appearance Changes
IEEE Robotics and Automation Letters ( IF 5.2 ) Pub Date : 2024-03-13 , DOI: 10.1109/lra.2024.3376967
Haiyang Jiang 1 , Songhao Piao 1 , Huai Yu 2 , Wei Li 3 , Lei Yu 2
Affiliation  

Severe appearance changes represent a pervasive and intricate challenge within Visual Place Recognition (VPR) tasks, and the current best solution adopts a composite strategy encompassing global retrieval and reranking. However, these reranking techniques necessitate sophisticated considerations to extract and match local features, which leads to a notable escalation of computational resource demands and inference duration. To this end, we propose a novel framework unifying global and local features within a single pipeline network, representing a simple solution capable of seamlessly operating across diverse scenarios without other fussy structures. Specifically, our overall thought involves training discriminative global features via image classification techniques, concurrently extracting effective local features directly from the intermediate layers without extra operations. To augment the expressiveness of features, we introduce multi-layer Convolutional Neural Network (CNN) feature maps to fuse diverse semantic information. Concurrently, a Transformer with relative position encoding is employed to capture cross-layer long-range and positional correlations. In conjunction with the associated attention values, low-resolution feature maps lessen features involved in the matching, resulting in decreased computational overhead and a remarkable acceleration of reranking. Extensive experimentations showcase that our model achieves State-Of-The-Art (SOTA) performance across datasets with severe appearance changes, the fastest inference duration and minimal memory usage.

中文翻译:

针对严重外观变化的强大视觉位置识别

严重的外观变化代表了视觉位置识别(VPR)任务中普遍且复杂的挑战,当前的最佳解决方案采用了包含全局检索和重新排名的复合策略。然而,这些重新排序技术需要复杂的考虑来提取和匹配局部特征,这导致计算资源需求和推理持续时间显着增加。为此,我们提出了一种新颖的框架,将全局和局部特征统一在单个管道网络中,代表了一种能够跨不同场景无缝运行而无需其他繁琐结构的简单解决方案。具体来说,我们的总体思路包括通过图像分类技术训练有区别的全局特征,同时直接从中间层提取有效的局部特征,而无需额外的操作。为了增强特征的表达能力,我们引入了多层卷积神经网络(CNN)特征图来融合不同的语义信息。同时,采用具有相对位置编码的 Transformer 来捕获跨层远程和位置相关性。与相关的注意力值相结合,低分辨率特征图减少了匹配中涉及的特征,从而减少了计算开销并显着加速了重新排名。广泛的实验表明,我们的模型在外观变化严重、推理持续时间最快和内存使用量最少的数据集中实现了最先进的 (SOTA) 性能。
更新日期:2024-03-13
down
wechat
bug