当前位置: X-MOL 学术IEEE Geosci. Remote Sens. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QETR: A Query-Enhanced Transformer for Remote Sensing Image Object Detection
IEEE Geoscience and Remote Sensing Letters ( IF 4.8 ) Pub Date : 2024-03-18 , DOI: 10.1109/lgrs.2024.3378531
Xinyu Ma 1 , Pengyuan Lv 1 , Yanfei Zhong 2
Affiliation  

Recently, transformer models have been introduced into the field of remote sensing image object detection, benefiting from their ability to model long-term information. However, the existing transformer-based object detection methods mainly consider the global interaction of local elements and have a limited ability to enhance the local information, which can bring some difficulties in distinguishing real objects and a complex background. In this letter, a query-enhanced transformer (QETR) model is proposed to solve the above problems. The proposed model consists of three main parts: an encoder, a decoder, and a detection head. A Swin transformer is used to extract deep features in the encoder. In the decoder, the object and anchor queries are initialized and the feature and position information of the objects is learned by the multihead self-attention (MHSA) and cross-attention mechanisms, respectively. Furthermore, a query align (QA) module along with a scale controller are proposed to enhance the object information around the local queries by limiting the attention to a certain range without losing important information. Finally, the boundaries and types of the objects are acquired from the detection head based on bipartite matching. To verify the effectiveness of the proposed method, comparative experiments were carried out with other state-of-the-art methodologies on two public datasets: the High-Resolution Remote Sensing Detection (HRRSD) dataset and the object detection in optical remote sensing images (DIOR) dataset. The experimental results confirm the effectiveness and superiority of the QETR model, which achieved 71.5% and 91.1% mean average precision (mAP) values on the DIOR and HRRSD datasets, respectively.

中文翻译:

QETR:用于遥感图像对象检测的查询增强型转换器

最近,变压器模型已被引入遥感图像目标检测领域,受益于其对长期信息进行建模的能力。然而,现有的基于Transformer的目标检测方法主要考虑局部元素的全局交互,增强局部信息的能力有限,这会给区分真实目标和复杂背景带来一些困难。在这封信中,提出了查询增强变压器(QETR)模型来解决上述问题。所提出的模型由三个主要部分组成:编码器、解码器和检测头。 Swin 变压器用于提取编码器中的深层特征。在解码器中,初始化对象和锚查询,并分别通过多头自注意(MHSA)和交叉注意机制来学习对象的特征和位置信息。此外,提出了查询对齐(QA)模块和比例控制器,通过将注意力限制在一定范围内而不丢失重要信息来增强本地查询周围的对象信息。最后,基于二分匹配从检测头获取物体的边界和类型。为了验证所提出方法的有效性,在两个公共数据集上与其他最先进的方法进行了比较实验:高分辨率遥感检测(HRRSD)数据集和光学遥感图像中的目标检测( DIOR)数据集。实验结果证实了 QETR 模型的有效性和优越性,在 DIOR 和 HRRSD 数据集上分别实现了 71.5% 和 91.1% 的平均精度 (mAP) 值。
更新日期:2024-03-18
down
wechat
bug