CrossFormer: Cross-guided attention for multi-modal object detection,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CrossFormer: Cross-guided attention for multi-modal object detection
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2024-02-15 , DOI: 10.1016/j.patrec.2024.02.012
Seungik Lee , Jaehyeong Park , Jinsun Park

Object detection is one of the essential tasks in a variety of real-world applications such as autonomous driving and robotics. In a real-world scenario, unfortunately, there are numerous challenges such as illumination changes, adverse weather conditions, and geographical changes, to name a few. To tackle the problem, we propose a novel multi-modal object detection model that is built upon a hierarchical transformer and cross-guidance between different modalities. The proposed hierarchical transformer consists of domain-specific feature extraction networks where intermediate features are connected by the proposed Cross-Guided Attention Module (CGAM) to enrich their representational power. Specifically, in the CGAM, one domain is regarded as a guide and the other is assigned to a base. After that, the cross-modal attention from the guide to the base is applied to the base feature. The CGAM works bidirectionally in parallel by exchanging roles between modalities to refine multi-modal features simultaneously. Experimental results on FLIR-aligned, LLVIP, and KAIST multispectral pedestrian datasets demonstrate that the proposed method is superior to previous multi-modal detection algorithms quantitatively and qualitatively.

中文翻译：

CrossFormer：多模态目标检测的交叉引导注意力

物体检测是自动驾驶和机器人等各种现实应用中的基本任务之一。不幸的是，在现实场景中，存在许多挑战，例如照明变化、恶劣天气条件和地理变化等等。为了解决这个问题，我们提出了一种新颖的多模态目标检测模型，该模型建立在分层变压器和不同模态之间的交叉指导的基础上。所提出的分层变压器由特定领域的特征提取网络组成，其中中间特征通过所提出的交叉引导注意模块（CGAM）连接以丰富其表示能力。具体地，在CGAM中，一个域被视为指南，另一个域被分配给基础。之后，从指南到基础的跨模态注意力被应用于基础特征。CGAM 通过在模态之间交换角色来并行双向工作，以同时细化多模态特征。在 FLIR-aligned、LLVIP 和 KAIST 多光谱行人数据集上的实验结果表明，所提出的方法在定量和定性上优于以前的多模态检测算法。

更新日期：2024-02-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>