当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HWNet v3: a joint embedding framework for recognition and retrieval of handwritten text
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2023-01-28 , DOI: 10.1007/s10032-022-00423-6
Praveen Krishnan , Kartik Dutta , C. V. Jawahar

Learning an efficient label embedding framework for word images enables effective word spotting of handwritten documents. In this work, we propose different schemes of label embedding for word images using deep neural architectures and their representations. We refer to our first scheme as the two-stage label embedding technique which projects both word images and their corresponding textual strings into a common subspace. We further introduce an end-to-end label embedding scheme using deep neural architecture which simplifies the embedding process and reports state-of-the-art performance for the task of word spotting and recognition. We also validate the role of synthetic data as a complementary modality to further enhance the embedding process. On the challenging IAM handwritten dataset, we report an mAP of 0.9753 for query-by-string-based word spotting, while under lexicon-based word recognition, our proposed method reports 1.67 and 3.62 character and word error rates, respectively. We also present the detailed ablation study on various variants of our end-to-end embedding architecture and perform analysis under varying embedding sizes. We further validate the embedding scheme on degraded printed document datasets from both Latin and Indic scripts.



中文翻译:

HWNet v3:用于手写文本识别和检索的联合嵌入框架

学习一个有效的文字图像标签嵌入框架可以有效地识别手写文档的单词。在这项工作中,我们使用深度神经架构及其表示提出了不同的文字图像标签嵌入方案。我们将我们的第一个方案称为两阶段标签嵌入技术,该技术将单词图像及其相应的文本字符串投影到一个公共子空间中。我们进一步介绍了一种使用深度神经架构的端到端标签嵌入方案,该方案简化了嵌入过程并报告了单词发现和识别任务的最新性能。我们还验证了合成数据作为补充模式的作用,以进一步增强嵌入过程。在具有挑战性的 IAM 手写数据集上,我们报告的 mAP 为 0。9753 用于基于字符串查询的单词识别,而在基于词典的单词识别下,我们提出的方法分别报告了 1.67 和 3.62 的字符和单词错误率。我们还对端到端嵌入架构的各种变体进行了详细的消融研究,并在不同的嵌入大小下进行了分析。我们进一步验证了来自拉丁语和印度语脚本的降级打印文档数据集的嵌入方案。

更新日期:2023-01-30
down
wechat
bug