当前位置: X-MOL 学术Comput. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Few-shot learning for word-level scene text script identification
Computational Intelligence ( IF 2.8 ) Pub Date : 2023-11-21 , DOI: 10.1111/coin.12612
Veronica Naosekpam 1 , Nilkanta Sahu 2
Affiliation  

Script identification of text in scene images has attracted massive attention recently. However, the existing techniques primarily emphasize on scripts where data are available abundantly, such as English, European, or East Asian. Although these methods are robust in dealing with high-resource data, how these techniques will work on low-resource scripts has yet to be discovered. For example, in India, there is a disparity among the text scripts across the country's demographic. To bridge the research gap for resource-constraint script identification, we present a few-shot learning network called the TextScriptFSLNet. This network does not require huge training data while achieving state-of-the-art performance on benchmark datasets. Our proposed method acts in accordance with a C $$ C $$ -way K $$ K $$ -shot paradigm by splitting the train set as support and query set, respectively. The support set learns representative knowledge of each class and creates its prototypes. We use multi-kernel spatial attention fused 2-layer convolutional neural network and averaging technique to generate the prototype of each class. This spatial attention aids in grasping important information in an image and enriches the feature representation. To the best of our knowledge, the proposed work is the first of its kind in the scene text understanding domain. Additionally, we created a dataset called Indic-FSL2023 comprising 10 of the 22 officially recognized Indian scripts. The proposed method achieves the highest accuracy among the tested methods on the newly created Indic-FSL2023. Experiments are also conducted on MLe2e to demonstrate its versatility. Furthermore, we also showed how our proposed model performed concerning illumination changes and blur on scene text script images.

中文翻译:

用于词级场景文本脚本识别的少样本学习

场景图像中文本的脚本识别最近引起了广泛的关注。然而,现有技术主要强调数据丰富的脚本,例如英语、欧洲或东亚。尽管这些方法在处理高资源数据方面非常强大,但这些技术如何在低资源脚本上发挥作用还有待发现。例如,在印度,全国人口中的文本脚本存在差异。为了弥补资源约束脚本识别的研究空白,我们提出了一种称为 TextScriptFSLNet 的小样本学习网络。该网络不需要大量的训练数据,同时在基准数据集上实现了最先进的性能。我们提出的方法按照 C $$ C $$ -方式 K $$ K $$ -shot 范例,将训练集分别拆分为支持集和查询集。支持集学习每个类的代表性知识并创建其原型。我们使用多核空间注意力融合 2 层卷积神经网络和平均技术来生成每个类别的原型。这种空间注意力有助于掌握图像中的重要信息并丰富特征表示。据我们所知,所提出的工作是场景文本理解领域的第一个此类工作。此外,我们创建了一个名为 Indic-FSL2023 的数据集,其中包含 22 种官方认可的印度文字中的 10 种。该方法在新创建的 Indic-FSL2023 上的测试方法中实现了最高的准确度。还在 MLe2e 上进行了实验以证明其多功能性。此外,我们还展示了我们提出的模型在场景文本脚本图像的照明变化和模糊方面的表现。
更新日期:2023-11-26
down
wechat
bug