当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SoftCTC—semi-supervised learning for text recognition using soft pseudo-labels
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2023-10-06 , DOI: 10.1007/s10032-023-00452-9
Martin Kišš , Michal Hradiš , Karel Beneš , Petr Buchal , Michal Kula

This paper explores semi-supervised training for sequence tasks, such as optical character recognition or automatic speech recognition. We propose a novel loss function—SoftCTC—which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence-based filtering step which is otherwise a crucial component of pseudo-labeling approaches to semi-supervised learning. We demonstrate the effectiveness of our method on a challenging handwriting recognition task and conclude that SoftCTC matches the performance of a finely tuned filtering-based pipeline. We also evaluated SoftCTC in terms of computational efficiency, concluding that it is significantly more efficient than a naïve CTC-based approach for training on multiple transcription variants, and we make our GPU implementation public.



中文翻译:

SoftCTC——使用软伪标签进行文本识别的半监督学习

本文探讨了序列任务的半监督训练,例如光学字符识别或自动语音识别。我们提出了一种新颖的损失函数——SoftCTC——它是 CTC 的扩展,允许同时考虑多个转录变体。这允许省略基于置信度的过滤步骤,否则该步骤是半监督学习的伪标记方法的关键组成部分。我们展示了我们的方法在具有挑战性的手写识别任务上的有效性,并得出结论,SoftCTC 与基于过滤的微调管道的性能相匹配。我们还评估了 SoftCTC 的计算效率,得出的结论是,在多种转录变体的训练方面,它比基于 CTC 的简单方法要高效得多,并且我们公开了我们的 GPU 实现。

更新日期:2023-10-06
down
wechat
bug