当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17695
Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, Elliot J. Crowley

We present PlainMamba: a simple non-hierarchical state space model (SSM) designed for general visual recognition. The recent Mamba model has shown how SSMs can be highly competitive with other architectures on sequential data and initial attempts have been made to apply it to images. In this paper, we further adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images by (i) a continuous 2D scanning process that improves spatial continuity by ensuring adjacency of tokens in the scanning sequence, and (ii) direction-aware updating which enables the model to discern the spatial relations of tokens by encoding directional information. Our architecture is designed to be easy to use and easy to scale, formed by stacking identical PlainMamba blocks, resulting in a model with constant width throughout all layers. The architecture is further simplified by removing the need for special tokens. We evaluate PlainMamba on a variety of visual recognition tasks including image classification, semantic segmentation, object detection, and instance segmentation. Our method achieves performance gains over previous non-hierarchical models and is competitive with hierarchical alternatives. For tasks requiring high-resolution inputs, in particular, PlainMamba requires much less computing while maintaining high performance. Code and models are available at https://github.com/ChenhongyiYang/PlainMamba

中文翻译:

PlainMamba:改进视觉识别中的非分层 Mamba

我们提出 PlainMamba:一种简单的非分层状态空间模型(SSM),专为一般视觉识别而设计。最近的 Mamba 模型展示了 SSM 如何在顺序数据上与其他架构具有高度竞争力,并且已经初步尝试将其应用于图像。在本文中,我们进一步将 Mamba 的选择性扫描过程应用于视觉领域,通过以下方式增强其从二维图像中学习特征的能力:(i)连续的 2D 扫描过程,通过确保扫描中标记的相邻性来提高空间连续性序列,以及(ii)方向感知更新,使模型能够通过编码方向信息来辨别标记的空间关系。我们的架构设计易于使用且易于扩展,通过堆叠相同的 PlainMamba 块形成,从而形成所有层宽度恒定的模型。通过消除对特殊令牌的需要,该架构进一步简化。我们在各种视觉识别任务上评估 PlainMamba,包括图像分类、语义分割、对象检测和实例分割。我们的方法比以前的非分层模型实现了性能提升,并且与分层替代方案具有竞争力。特别是对于需要高分辨率输入的任务,PlainMamba 在保持高性能的同时需要更少的计算。代码和模型可在 https://github.com/ChenhongyiYang/PlainMamba 获取
更新日期:2024-03-27
down
wechat
bug