当前位置: X-MOL 学术Program. Comput. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scene Text Detection Using HRNet and Spatial Attention Mechanism
Programming and Computer Software ( IF 0.7 ) Pub Date : 2024-01-24 , DOI: 10.1134/s0361768823080212
Qingsong Tang , Zhangyan Jiang , Bolin Pan , Jinting Guo , Wuming Jiang

Abstract

To better extract the features from text instances with various shapes, a scene text detector using High Resolution Net (HRNet) and spatial attention mechanism is proposed in this paper. Specifically, we use HRNetv2-W18 as the backbone network to extract the text feature in text instances with complex shapes. Considering that the scene text instance is usually small, to avoid too small feature size, we optimize HRNet through deformable convolution and Smooth Maximum Unit (SMU) activation function, so that the network can retain more detail information and location information of the text instance. In addition, a Text Region Attention Module (TRAM) is added after the backbone to make it pay more attention to the text location information and a loss function is used to TRAM, so that the network can learn the features better. The experimental results illustrate that the proposed method can compete with the state-of-the-art methods. Code is available at: https://github.com/zhangyan1005/HR-DBNet.



中文翻译:

使用 HRNet 和空间注意力机制进行场景文本检测

摘要

为了更好地从各种形状的文本实例中提取特征,本文提出了一种使用高分辨率网络(HRNet)和空间注意机制的场景文本检测器。具体来说,我们使用 HRNetv2-W18 作为主干网络来提取具有复杂形状的文本实例中的文本特征。考虑到场景文本实例通常较小,为了避免特征尺寸过小,我们通过可变形卷积和平滑最大单元(SMU)激活函数对HRNet进行优化,使网络能够保留​​文本实例的更多细节信息和位置信息。此外,在backbone之后添加文本区域注意模块(TRAM),使其更加关注文本位置信息,并对TRAM使用损失函数,以便网络可以更好地学习特征。实验结果表明,所提出的方法可以与最先进的方法相媲美。代码位于:https://github.com/zhangyan1005/HR-DBNet。

更新日期:2024-01-25
down
wechat
bug