当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2022-09-14 , DOI: 10.1007/s10032-022-00410-x
Nishatul Majid , Elisa H. Barney Smith

This paper demonstrates a framework for offline handwriting recognition using character spotting and autonomous tagging which works for any alphabetic script. Character spotting builds on the idea of object detection to find character elements in unsegmented word images. An autonomous tagging approach is introduced which automates the production of a character image training set by estimating character locations in a word based on typical character size. Although scripts can vary vividly from each other, our proposed approach provides a simple and powerful workflow for unconstrained offline recognition that should work for any alphabetic script with few adjustments. Here we demonstrate this approach with handwritten Bangla, obtaining a character recognition accuracy (CRA) of 94.8% and 91.12% with precision and autonomous tagging, respectively. Furthermore, we explained how character spotting and autonomous tagging can be implemented for other alphabetic scripts. We demonstrated that with handwritten Hangul/Korean obtaining a Jamo recognition accuracy (JRA) of 93.16% using a tiny fraction of the PE92 training set. The combination of character spotting and autonomous tagging takes away one of the biggest frustrations—data annotation by hand, and thus, we believe this has the potential to revolutionize the growth of offline recognition development.



中文翻译:

字符识别和自主标记:孟加拉语、韩语和其他字母脚本的离线手写识别

本文演示了一个使用字符识别和自主标记的离线手写识别框架,该框架适用于任何字母脚本。字符识别建立在对象检测的思想之上,以在未分割的单词图像中查找字符元素。引入了一种自主标记方法,该方法通过根据典型字符大小估计单词中的字符位置来自动生成字符图像训练集。尽管脚本之间可能存在很大差异,但我们提出的方法为不受约束的离线识别提供了一个简单而强大的工作流程,该工作流程应该适用于任何字母脚本,只需进行少量调整。在这里,我们用手写孟加拉语演示了这种方法,分别获得了 94.8% 和 91.12% 的字符识别准确率 (CRA) 和自动标记。此外,我们解释了如何为其他字母脚本实现字符识别和自主标记。我们证明了手写韩文/韩文使用 PE92 训练集的一小部分获得了 93.16% 的 Jamo 识别准确率 (JRA)。字符识别和自主标记的结合消除了最大的挫折之一——手动数据注释,因此,我们相信这有可能彻底改变离线识别发展的增长。

更新日期:2022-09-14
down
wechat
bug