Hierarchical Attention Transformer for Hyperspectral Image Classification,IEEE Geoscience and Remote Sensing Letters

当前位置： X-MOL 学术 › IEEE Geosci. Remote Sens. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hierarchical Attention Transformer for Hyperspectral Image Classification
IEEE Geoscience and Remote Sensing Letters ( IF 4.8 ) Pub Date : 2024-03-20 , DOI: 10.1109/lgrs.2024.3379509
Tahir Arshad ₁ , Junping Zhang ₁

Affiliation

Hyperspectral image (HSI) data contain rich spectral–spatial information, which can be useful for various applications. Many methods have been proposed to classify the HSIs. Nonetheless, the availability of limited training samples in traditional models frequently weakens their ability to handle the inherent complexity of the task. Deep learning models have been successfully applied in the field of remote sensing. In this letter, we propose a vision transformer (ViT)-based network called hierarchical attention transformer that combines the properties of local representation learning in 3-D and 2-D convolutional neural networks (CNNs) and potent global modeling capabilities in ViT. We leverage the efficiency of window-based self-attention. Within each window, there are dedicated tokens that contribute to both local and global representation learning. The overall accuracy (OA) of the proposed model achieved 99.70%, 99.89%, 99.56%, 81.75%, and 99.59% on five datasets.

中文翻译：

用于高光谱图像分类的分层注意力变换器

高光谱图像（HSI）数据包含丰富的光谱空间信息，可用于各种应用。已经提出了许多方法来对 HSI 进行分类。尽管如此，传统模型中有限的训练样本经常削弱它们处理任务固有复杂性的能力。深度学习模型已成功应用于遥感领域。在这封信中，我们提出了一种基于视觉变换器 (ViT) 的网络，称为分层注意力变换器，它结合了 3D 和 2D 卷积神经网络 (CNN) 中局部表示学习的特性以及 ViT 中强大的全局建模功能。我们利用基于窗口的自注意力的效率。在每个窗口内，都有专用的标记，有助于本地和全局表示学习。该模型的总体准确率（OA）在五个数据集上达到了 99.70%、99.89%、99.56%、81.75% 和 99.59%。

更新日期：2024-03-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>