A Hybrid Scene Text Script Identification Network for regional Indian Languages,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Hybrid Scene Text Script Identification Network for regional Indian Languages
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2024-02-24 , DOI: 10.1145/3649439
Veronica Naosekpam ₁ , Nilkanta Sahu ₂

Affiliation

In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.

中文翻译：

印度地方语言的混合场景文本脚本识别网络

在这项工作中，我们引入了 WAFFNet，这是一种以注意力为中心的特征融合架构，专为词级多语言场景文本脚本识别而定制。由于传统方法完全依赖基于特征的方法或深度学习策略的局限性，我们的方法合并了统计和深度特征以弥合差距。在 WAFFNet 的核心，我们利用了局部二进制模式的优点，这是一种突出的描述符，可以捕获具有高维、语义丰富的卷积特征的低级纹理特征。这种融合通过空间注意机制明智地增强，确保有针对性地强调输入图像的语义关键区域。为了解决多类分类场景中的类不平衡问题，我们采用了加权目标函数。这不仅规范了学习过程，还解决了班级不平衡问题。WAFFNet 的架构完整性通过端到端训练范例得以保留，利用迁移学习来加速收敛并优化性能指标。考虑到当前数据集中印度区域语言的代表性不足，我们精心策划了 IIITG-STLI2023，这是一个综合数据集，其中包含英语以及六种代表性不足的印度语言：印地语、卡纳达语、马拉雅拉姆语、泰卢固语、孟加拉语和曼尼普尔语。对 IIITG-STLI2023 以及已建立的 MLe2e 和 SIW-13 数据集的严格评估强调了 WAFFNet 相对于传统特征工程方法以及最先进的深度学习框架的优越性。因此，所提出的 WAFFNet 框架为场景文本图像中的语言识别提供了强大且有效的解决方案。

更新日期：2024-02-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>