Abstract
Annotating words in a historical document image archive for word image recognition purpose demands time and skilled human resource (like historians, paleographers). In a real-life scenario, obtaining sample images for all possible words is also not feasible. However, zero-shot learning methods could aptly be used to recognize unseen/out-of-lexicon words in such historical document images. Based on previous state-of-the-art method for zero-shot word recognition “Pho(SC)Net”, we propose a hybrid model based on the CTC framework (Pho(SC)-CTC) that takes advantage of the rich features learned by Pho(SC)Net followed by a “connectionist temporal classification” (CTC) framework to perform the final classification. Encouraging results were obtained on two publicly available historical document datasets and one synthetic handwritten dataset, which justifies the efficacy of Pho(SC)-CTC and Pho(SC)Net.
Similar content being viewed by others
References
Akata, Z., Perronnin, F., Harchaoui, Z., et al.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)
Almazán, J., Gordo, A., Fornés, A., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Annadani, Y., Biswas, S.: Preserving semantic relations for zero-shot learning. In: The IEEE Conference on Computer Vision and Pattern Recognition(2018)
Bluche, T., Hamel, S., Kermorvant, C., et al Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: International Conference on Document Analysis and Recognition, pp 311–316(2017)
Carbonell, M., Fornés, A., Villegas, M., et al.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219–227 (2020)
Chanda, S., Baas, J., Haitink, D.: et al Zero-shot learning based approach for medieval word recognition using deep-learned features. In: International Conference on Frontiers of Handwriting Recognition, pp 345–350(2018)
Dutta, K., Krishnan, P., Mathew, M.: et al Improving CNN-RNN hybrid networks for handwriting recognition. In: International Conference on Frontiers of Handwriting Recognition, pp 80–85(2018)
Fischer, A., Keller, A., Frinken, V., et al.: Lexicon-free handwritten word spotting using character hmms. Pattern Recogn. Lett. 33(7), 934–942 (2012)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks Adv Neural Information Process Syst pp 545–552(2009)
Graves, A., Fernández, S., Gomez, F.: et al Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural ’networks ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning 2006: 369–376(2006)
Kang, L., Toledo, JI., Riba, P.: et al Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition, pp 459–472(2018)
Kass, D., Vats, E.: Attentionhtr: Handwritten text recognition based on attention encoder-decoder networks(2022)
Krishnan, P., Jawahar, C.: Hwnet v2: an efficient word image representation for handwritten documents. Int. J. Doc. Anal. Recogn. 22(4), 387–405 (2019)
Krishnan, P., Jawahar, C.: Bringing semantics into word image representation. Pattern Recognition 108(2020)
Krishnan, P., Dutta, K., Jawahar, CV.: Word spotting and recognition using deep embedding. In: Document Analysis Systems, pp 1–6(2018)
Li, K., Min, MR., Fu, Y.: Rethinking zero-shot learning: A conditional visual classification perspective. In: IEEE International Conference on Computer Vision, pp 3583–3592(2019)
Li, Y., Zhang, J., Zhang, J.: et al Discriminative learning of latent features for zero-shot recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp 7463–7471(2018)
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vision 129, 161–184 (2020)
Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Niu, L., Veeraraghavan, A., Sabharwal, A.: Webly supervised learning meets zero-shot learning: A hybrid approach for fine-grained classification. In: IEEE Conference on Computer Vision and Pattern Recognition(2018)
Paul, A., Krishnan, NC., Munjal, P.: Semantically aligned bias reducing zero shot learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7056–7065(2019)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2017)
Sudholt, S., Fink, GA.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition, 2016, pp 277–282(2016)
Sudholt, S., Fink, GA.: Evaluating word string embeddings and loss functions for cnn-based word spotting. In: International Conference on Document Analysis and Recognition, pp 493–498(2017)
Wilkinson, T., Lindström, J., Brun, A.: Neural ctrl-f: Segmentation-free query-by-string word spotting in handwritten manuscript collections. In: International Conference on Computer Vision, pp 4443–4452(2017)
Wolf, F., Fink, GA.: Annotation-free learning of deep representations for word spotting using synthetic data and self labeling. In: Document Analysis Systems, Lecture Notes in Computer Science, vol 12116. Springer, pp 293–308(2020)
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning - the good, the bad and the ugly. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp 3077–3086(2017)
Xie, GS., Liu, L., Jin, X.: et al Attentive region embedding network for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Zhang, H., Koniusz, P.: Zero-shot kernel learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7670–7679(2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bhatt, R., Rai, A., Chanda, S. et al. Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition. IJDAR 26, 51–63 (2023). https://doi.org/10.1007/s10032-022-00407-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-022-00407-6