当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CarveNet: a channel-wise attention-based network for irregular scene text recognition
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2022-04-05 , DOI: 10.1007/s10032-022-00398-4
Guibin Wu 1 , Zheng Zhang 1 , Yongping Xiong 1
Affiliation  

Although it has achieved considerable progress in recent years, recognizing irregular text in natural scene is still a challenging problem due to the distortion and background interference. The prior works use either spatial transformation network(STN) or 2D Attention mechanism to improve the recognition accuracy. However, STN-based methods are not robust as the limited network capacity while 2D Attention-based methods are highly interfered by fuzziness, distortion and background. In this paper, we propose a text recognition model CarveNet which consists of three substructures: feature extractor, feature filter and decoder. Feature extractor utilizes FPN (Feature Pyramid Network) to aggregate multi-scale hierarchical feature maps and obtain a larger receptive field. Then, feature filter composed of stacked Residual Channel Attention Block is followed to separate text features from background interference. The 2D self-attention-based decoder generates the text sequence according to the output of feature filter and the previously generated symbols. Extensive evaluation results show CarveNet achieves state-of-the-art on both regular and irregular scene text recognition benchmark datasets. Compared with the previous work based on 2D self-attention, CarveNet achieves accuracy increases of 2.3 and 4.6% on irregular dataset SVTP and CT80.



中文翻译:

CarveNet:一种基于通道注意的网络,用于不规则场景文本识别

尽管近年来取得了长足的进步,但由于失真和背景干扰,识别自然场景中的不规则文本仍然是一个具有挑战性的问题。先前的工作使用空间变换网络(STN)或二维注意机制来提高识别精度。然而,基于 STN 的方法由于网络容量有限而不鲁棒,而基于 2D Attention 的方法受到模糊、失真和背景的高度干扰。在本文中,我们提出了一个文本识别模型 CarveNet,它由三个子结构组成:特征提取器特征过滤器解码器特征提取器利用FPN(Feature Pyramid Network)聚合多尺度分层特征图,获得更大的感受野。然后,由堆叠的 Residual Channel Attention Block 组成的特征过滤器将文本特征与背景干扰分离。基于 2D self-attention-based 的解码器根据特征过滤器的输出和之前生成的符号生成文本序列。广泛的评估结果表明,CarveNet 在常规和不规则场景文本识别基准数据集上均达到了最先进的水平。与之前基于 2D self-attention 的工作相比,CarveNet 在不规则数据集 SVTP 和 CT80 上实现了 2.3% 和 4.6% 的准确度提升。

更新日期:2022-04-05
down
wechat
bug