CarveNet: a channel-wise attention-based network for irregular scene text recognition,International Journal on Document Analysis and Recognition

当前位置： X-MOL 学术 › Int. J. Doc. Anal. Recognit. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CarveNet: a channel-wise attention-based network for irregular scene text recognition
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2022-04-05 , DOI: 10.1007/s10032-022-00398-4
Guibin Wu ₁ , Zheng Zhang ₁ , Yongping Xiong ₁

Affiliation

Although it has achieved considerable progress in recent years, recognizing irregular text in natural scene is still a challenging problem due to the distortion and background interference. The prior works use either spatial transformation network(STN) or 2D Attention mechanism to improve the recognition accuracy. However, STN-based methods are not robust as the limited network capacity while 2D Attention-based methods are highly interfered by fuzziness, distortion and background. In this paper, we propose a text recognition model CarveNet which consists of three substructures: feature extractor, feature filter and decoder. Feature extractor utilizes FPN (Feature Pyramid Network) to aggregate multi-scale hierarchical feature maps and obtain a larger receptive field. Then, feature filter composed of stacked Residual Channel Attention Block is followed to separate text features from background interference. The 2D self-attention-based decoder generates the text sequence according to the output of feature filter and the previously generated symbols. Extensive evaluation results show CarveNet achieves state-of-the-art on both regular and irregular scene text recognition benchmark datasets. Compared with the previous work based on 2D self-attention, CarveNet achieves accuracy increases of 2.3 and 4.6% on irregular dataset SVTP and CT80.

中文翻译：

CarveNet：一种基于通道注意的网络，用于不规则场景文本识别

尽管近年来取得了长足的进步，但由于失真和背景干扰，识别自然场景中的不规则文本仍然是一个具有挑战性的问题。先前的工作使用空间变换网络（STN）或二维注意机制来提高识别精度。然而，基于 STN 的方法由于网络容量有限而不鲁棒，而基于 2D Attention 的方法受到模糊、失真和背景的高度干扰。在本文中，我们提出了一个文本识别模型 CarveNet，它由三个子结构组成：特征提取器、特征过滤器和解码器。特征提取器利用FPN（Feature Pyramid Network）聚合多尺度分层特征图，获得更大的感受野。然后，由堆叠的 Residual Channel Attention Block 组成的特征过滤器将文本特征与背景干扰分离。基于 2D self-attention-based 的解码器根据特征过滤器的输出和之前生成的符号生成文本序列。广泛的评估结果表明，CarveNet 在常规和不规则场景文本识别基准数据集上均达到了最先进的水平。与之前基于 2D self-attention 的工作相比，CarveNet 在不规则数据集 SVTP 和 CT80 上实现了 2.3% 和 4.6% 的准确度提升。

更新日期：2022-04-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>