当前位置: X-MOL 学术Digit. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HFENet: Hybrid feature encoder network for detecting salient objects in RGB-thermal images
Digital Signal Processing ( IF 2.9 ) Pub Date : 2024-02-23 , DOI: 10.1016/j.dsp.2024.104439
Fan Sun , Wujie Zhou , Weiqing Yan , Yulai Zhang

Deep convolutional neural networks (CNNs) have gained prominence in computer vision applications, including RGB salient object detection (SOD), owing to the advancements in deep learning. Nevertheless, the majority of deep CNNs employ either VGGNet or ResNet as their backbone architecture for extracting image information. This approach may lead to the following problems. 1) Variations between imaging modalities during feature extraction across layers. Cross-modal features across layers are often fused in a single step, resulting in inadequate cross-modal feature extraction. 2) Feature long-range dependence problem in multilayer feature decoding. 3) Image boundary blurring. To address these issues, we initially leverage the advantages offered by the VGGNet and ResNet architectures. Additionally, we present a novel hybrid VGG–ResNet feature encoder for RGB-T SOD. Specifically, we introduce a geometry information aggregation module that effectively combines and enhances the VGGNet spatial features of the RGB-T modalities from the bottom to the top. Moreover, we propose a innovative global saliency perception module that progressively refines the ResNet semantic features from the top to the bottom by integrating both local and global information. Furthermore, we introduce a Pearson-gated module to tackle the challenge of long-range dependence between features. This module utilizes gating to merge features by calculating the Pearson correlation coefficients of the fused features at multiple levels. Lastly, we devise an edge-aware module to precisely learn the contours of salient objects, thereby enhancing the clarity of the object boundaries. Extensive experiments conducted on three RGB-T SOD benchmarks demonstrate that our proposed network surpasses the performance of state-of-the-art methods for SOD.

中文翻译:

HFENet:混合特征编码器网络,用于检测 RGB 热图像中的显着对象

由于深度学习的进步,深度卷积神经网络 (CNN) 在计算机视觉应用中获得了突出地位,包括 RGB 显着目标检测 (SOD)。尽管如此,大多数深度 CNN 采用 VGGNet 或 ResNet 作为提取图像信息的骨干架构。这种方法可能会导致以下问题。 1)跨层特征提取期间成像模式之间的变化。跨层的跨模态特征通常在一个步骤中融合,导致跨模态特征提取不充分。 2)多层特征解码中的特征长程依赖问题。 3)图像边界模糊。为了解决这些问题,我们首先利用 VGGNet 和 ResNet 架构提供的优势。此外,我们还提出了一种用于 RGB-T SOD 的新型混合 VGG-ResNet 特征编码器。具体来说,我们引入了一个几何信息聚合模块,该模块从下到上有效地组合和增强了 RGB-T 模态的 VGGNet 空间特征。此外,我们提出了一种创新的全局显着性感知模块,通过整合局部和全局信息,从上到下逐步细化 ResNet 语义特征。此外,我们引入了皮尔逊门控模块来解决特征之间的远程依赖的挑战。该模块利用门控通过计算多个级别融合特征的皮尔逊相关系数来合并特征。最后,我们设计了一个边缘感知模块来精确学习显着物体的轮廓,从而提高物体边界的清晰度。在三个 RGB-T SOD 基准上进行的大量实验表明,我们提出的网络超越了最先进的 SOD 方法的性能。
更新日期:2024-02-23
down
wechat
bug