Wise-SrNet: a novel architecture for enhancing image classification by learning spatial resolution of feature maps,Pattern Analysis and Applications

当前位置： X-MOL 学术 › Pattern Anal. Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Wise-SrNet: a novel architecture for enhancing image classification by learning spatial resolution of feature maps
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2024-03-09 , DOI: 10.1007/s10044-024-01211-0
Mohammad Rahimzadeh , Soroush Parvin , Amirali Askari , Elnaz Safi , Mohammad Reza Mohammadi

One of the main challenges, since the advancement of convolutional neural networks is how to connect the extracted feature map to the final classification layer. VGG models used two sets of fully connected layers for the classification part of their architectures, which significantly increased the number of models’ weights. ResNet and the next deep convolutional models used the global average pooling layer to compress the feature map and feed it to the classification layer. Although using the GAP layer reduces the computational cost, but also causes losing spatial resolution of the feature map, which results in decreasing learning efficiency. In this paper, we aim to tackle this problem by replacing the GAP layer with a new architecture called Wise-SrNet. It is inspired by the depthwise convolutional idea and is designed for processing spatial resolution while not increasing computational cost. We have evaluated our method using three different datasets they are Intel Image Classification Challenge, MIT Indoors Scenes, and a part of the ImageNet dataset. We investigated the implementation of our architecture on several models of the Inception, ResNet, and DenseNet families. Applying our architecture has revealed a significant effect on increasing convergence speed and accuracy. Our experiments on images with 224224 resolution increased the Top-1 accuracy between 2 to 8% on different datasets and models. Running our models on 512512 resolution images of the MIT Indoors Scenes dataset showed a notable result of improving the Top-1 accuracy within 3 to 26%. We will also demonstrate the GAP layer’s disadvantage when the input images are large and the number of classes is not few. In this circumstance, our proposed architecture can do a great help in enhancing classification results. The code is shared at https://github.com/mr7495/image-classification-spatial.

中文翻译：

Wise-SrNet：一种通过学习特征图的空间分辨率来增强图像分类的新颖架构

由于卷积神经网络的进步，主要挑战之一是如何将提取的特征图连接到最终的分类层。VGG模型使用两组全连接层作为其架构的分类部分，这显着增加了模型的权重数量。ResNet 和接下来的深度卷积模型使用全局平均池化层来压缩特征图并将其馈送到分类层。虽然使用GAP层降低了计算成本，但也会导致特征图空间分辨率的损失，从而导致学习效率下降。在本文中，我们的目标是通过使用名为 Wise-SrNet 的新架构替换 GAP 层来解决这个问题。它受到深度卷积思想的启发，旨在处理空间分辨率，同时不增加计算成本。我们使用三个不同的数据集评估了我们的方法，它们是英特尔图像分类挑战赛、麻省理工学院室内场景和 ImageNet 数据集的一部分。我们研究了我们的架构在 Inception、ResNet 和 DenseNet 系列的多个模型上的实现。应用我们的架构已经显示出对提高收敛速度和准确性的显着效果。我们对分辨率为 224224 的图像进行的实验将不同数据集和模型的 Top-1 准确率提高了 2% 到 8%。在 MIT 室内场景数据集的 512512 分辨率图像上运行我们的模型，结果显着，将 Top-1 准确率提高了 3% 到 26%。我们还将演示当输入图像很大且类别数量不少时 GAP 层的缺点。在这种情况下，我们提出的架构可以为增强分类结果提供很大帮助。该代码在 https://github.com/mr7495/image-classification-spatial 共享。

更新日期：2024-03-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>