Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2024-04-05 , DOI: 10.1016/j.engappai.2024.108261
Yang Yu , Yi Zhang , Zeyu Cheng , Zhe Song , Chengkai Tang

Attention mechanisms have gradually become necessary to enhance the representational power of convolutional neural networks (CNNs). Despite recent progress in attention mechanism research, some open problems still exist. Most existing methods ignore modeling multi-scale feature representations, structural information, and long-range channel dependencies, which are essential for delivering more discriminative attention maps. This study proposes a novel, low-overhead, high-performance attention mechanism with strong generalization ability for various networks and datasets. This mechanism is called Multi-Scale Spatial Pyramid Attention (MSPA) and can be used to solve the limitations of other attention methods. For the critical components of MSPA, we not only develop the Hierarchical-Phantom Convolution (HPC) module, which can extract multi-scale spatial information at a more granular level utilizing hierarchical residual-like connections, but also design the Spatial Pyramid Recalibration (SPR) module, which can integrate structural regularization and structural information in an adaptive combination mechanism, while employing the Softmax operation to build long-range channel dependencies. The proposed MSPA is a powerful tool that can be conveniently embedded into various CNNs as a plug-and-play component. Correspondingly, using MSPA to replace the 3 × 3 convolution in the bottleneck residual blocks of ResNets, we created a series of simple and efficient backbones named MSPANet, which naturally inherit the advantages of MSPA. Without bells and whistles, our method substantially outperforms other state-of-the-art counterparts in all evaluation metrics based on extensive experimental results from CIFAR-100 and ImageNet-1K image recognition. When applying MSPA to ResNet-50, our model achieves top-1 classification accuracy of 81.74% and 78.40% on the CIFAR-100 and ImageNet-1K benchmarks, exceeding the corresponding baselines by 3.95% and 2.27%, respectively. We also obtained promising performance improvements of 1.15% and 0.91% compared to the competitive EPSANet-50. In addition, empirical research results in autonomous driving engineering applications also demonstrate that our method can significantly improve the accuracy and real-time performance of image recognition with cheaper overhead. Our code is publicly available at .

中文翻译：

图像识别的多尺度空间金字塔注意力机制：一种有效的方法

注意力机制逐渐成为增强卷积神经网络（CNN）表征能力的必要条件。尽管注意力机制研究最近取得了进展，但仍然存在一些悬而未决的问题。大多数现有方法忽略了对多尺度特征表示、结构信息和远程通道依赖性的建模，而这些对于提供更具辨别力的注意力图至关重要。本研究提出了一种新颖、低开销、高性能的注意力机制，对各种网络和数据集具有很强的泛化能力。这种机制称为多尺度空间金字塔注意力（MSPA），可以用来解决其他注意力方法的局限性。对于MSPA的关键组件，我们不仅开发了分层幻影卷积（HPC）模块，该模块可以利用分层残差连接在更细粒度的水平上提取多尺度空间信息，而且还设计了空间金字塔重新校准（SPR））模块，它可以将结构正则化和结构信息集成在自适应组合机制中，同时采用Softmax操作来构建远程通道依赖关系。所提出的 MSPA 是一个强大的工具，可以作为即插即用组件方便地嵌入到各种 CNN 中。相应地，使用MSPA替代ResNets瓶颈残差块中的3×3卷积，我们创建了一系列简单高效的主干网络，命名为MSPANet，它自然继承了MSPA的优点。基于 CIFAR-100 和 ImageNet-1K 图像识别的广泛实验结果，我们的方法在所有评估指标上都明显优于其他最先进的同行，没有任何附加功能。当将 MSPA 应用到 ResNet-50 时，我们的模型在 CIFAR-100 和 ImageNet-1K 基准上实现了 81.74% 和 78.40% 的 top-1 分类准确率，分别超出相应基线 3.95% 和 2.27%。与竞争对手 EPSANet-50 相比，我们还获得了 1.15% 和 0.91% 的可喜性能提升。此外，自动驾驶工程应用中的实证研究结果也表明，我们的方法可以以更便宜的开销显着提高图像识别的准确性和实时性。我们的代码可在公开获取。

更新日期：2024-04-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>