Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Bhatt, Ravi; Rai, Anuj; Chanda, Sukalpa; Krishnan, Narayanan C.

doi:10.1007/s10032-022-00407-6

Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Original Paper
Published: 12 July 2022

Volume 26, pages 51–63, (2023)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Ravi Bhatt¹^na1,
Anuj Rai¹^na1,
Sukalpa Chanda² &
…
Narayanan C. Krishnan¹

335 Accesses
2 Citations
Explore all metrics

Abstract

Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, zero-shot learning methods could aptly be used to recognize unseen/out-of-lexicon words in such historical document images. Based on previous state-of-the-art method for zero-shot word recognition “Pho(SC)Net”, we propose a hybrid model based on the CTC framework (Pho(SC)-CTC) that takes advantage of the rich features learned by Pho(SC)Net followed by a “connectionist temporal classification” (CTC) framework to perform the final classification. Encouraging results were obtained on two publicly available historical document datasets and one synthetic handwritten dataset, which justifies the efficacy of Pho(SC)-CTC and Pho(SC)Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents

Offline Handwritten Amharic Character Recognition Using Few-Shot Learning

Analyzing the Potential of Zero-Shot Recognition for Document Image Classification

Notes

References

Akata, Z., Perronnin, F., Harchaoui, Z., et al.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)
Article Google Scholar
Almazán, J., Gordo, A., Fornés, A., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Article Google Scholar
Annadani, Y., Biswas, S.: Preserving semantic relations for zero-shot learning. In: The IEEE Conference on Computer Vision and Pattern Recognition(2018)
Bluche, T., Hamel, S., Kermorvant, C., et al Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: International Conference on Document Analysis and Recognition, pp 311–316(2017)
Carbonell, M., Fornés, A., Villegas, M., et al.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219–227 (2020)
Article Google Scholar
Chanda, S., Baas, J., Haitink, D.: et al Zero-shot learning based approach for medieval word recognition using deep-learned features. In: International Conference on Frontiers of Handwriting Recognition, pp 345–350(2018)
Dutta, K., Krishnan, P., Mathew, M.: et al Improving CNN-RNN hybrid networks for handwriting recognition. In: International Conference on Frontiers of Handwriting Recognition, pp 80–85(2018)
Fischer, A., Keller, A., Frinken, V., et al.: Lexicon-free handwritten word spotting using character hmms. Pattern Recogn. Lett. 33(7), 934–942 (2012)
Article Google Scholar
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks Adv Neural Information Process Syst pp 545–552(2009)
Graves, A., Fernández, S., Gomez, F.: et al Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural ’networks ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning 2006: 369–376(2006)
Kang, L., Toledo, JI., Riba, P.: et al Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition, pp 459–472(2018)
Kass, D., Vats, E.: Attentionhtr: Handwritten text recognition based on attention encoder-decoder networks(2022)
Krishnan, P., Jawahar, C.: Hwnet v2: an efficient word image representation for handwritten documents. Int. J. Doc. Anal. Recogn. 22(4), 387–405 (2019)
Article Google Scholar
Krishnan, P., Jawahar, C.: Bringing semantics into word image representation. Pattern Recognition 108(2020)
Krishnan, P., Dutta, K., Jawahar, CV.: Word spotting and recognition using deep embedding. In: Document Analysis Systems, pp 1–6(2018)
Li, K., Min, MR., Fu, Y.: Rethinking zero-shot learning: A conditional visual classification perspective. In: IEEE International Conference on Computer Vision, pp 3583–3592(2019)
Li, Y., Zhang, J., Zhang, J.: et al Discriminative learning of latent features for zero-shot recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp 7463–7471(2018)
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vision 129, 161–184 (2020)
Article Google Scholar
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Article MATH Google Scholar
Niu, L., Veeraraghavan, A., Sabharwal, A.: Webly supervised learning meets zero-shot learning: A hybrid approach for fine-grained classification. In: IEEE Conference on Computer Vision and Pattern Recognition(2018)
Paul, A., Krishnan, NC., Munjal, P.: Semantically aligned bias reducing zero shot learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7056–7065(2019)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2017)
Sudholt, S., Fink, GA.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition, 2016, pp 277–282(2016)
Sudholt, S., Fink, GA.: Evaluating word string embeddings and loss functions for cnn-based word spotting. In: International Conference on Document Analysis and Recognition, pp 493–498(2017)
Wilkinson, T., Lindström, J., Brun, A.: Neural ctrl-f: Segmentation-free query-by-string word spotting in handwritten manuscript collections. In: International Conference on Computer Vision, pp 4443–4452(2017)
Wolf, F., Fink, GA.: Annotation-free learning of deep representations for word spotting using synthetic data and self labeling. In: Document Analysis Systems, Lecture Notes in Computer Science, vol 12116. Springer, pp 293–308(2020)
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning - the good, the bad and the ugly. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp 3077–3086(2017)
Xie, GS., Liu, L., Jin, X.: et al Attentive region embedding network for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Zhang, H., Koniusz, P.: Zero-shot kernel learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7670–7679(2018)

Download references

Author information

Ravi Bhatt and Anuj Rai have contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, 140001, India
Ravi Bhatt, Anuj Rai & Narayanan C. Krishnan
Department of Computer Science and Communication, Østfold University College, 1757, Halden, Norway
Sukalpa Chanda

Authors

Ravi Bhatt
View author publications
You can also search for this author in PubMed Google Scholar
Anuj Rai
View author publications
You can also search for this author in PubMed Google Scholar
Sukalpa Chanda
View author publications
You can also search for this author in PubMed Google Scholar
Narayanan C. Krishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ravi Bhatt.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhatt, R., Rai, A., Chanda, S. et al. Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition. IJDAR 26, 51–63 (2023). https://doi.org/10.1007/s10032-022-00407-6

Download citation

Received: 25 November 2021
Revised: 06 April 2022
Accepted: 15 June 2022
Published: 12 July 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10032-022-00407-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Abstract

Access this article

Similar content being viewed by others

Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents

Offline Handwritten Amharic Character Recognition Using Few-Shot Learning

Analyzing the Potential of Zero-Shot Recognition for Document Image Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Abstract

Access this article

Similar content being viewed by others

Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents

Offline Handwritten Amharic Character Recognition Using Few-Shot Learning

Analyzing the Potential of Zero-Shot Recognition for Document Image Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation