Scene text detection using structured information and an end-to-end trainable generative adversarial networks

Naveen, Palanichamy; Hassaballah, Mahmoud

doi:10.1007/s10044-024-01259-y

Scene text detection using structured information and an end-to-end trainable generative adversarial networks

Original Paper
Published: 19 March 2024

Volume 27, article number 33, (2024)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Palanichamy Naveen¹ &
Mahmoud Hassaballah^2,3

155 Accesses
Explore all metrics

Abstract

Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

SSD: Single Shot MultiBox Detector

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

CBAM: Convolutional Block Attention Module

Data availability

No data is exclusively prepared for the preparation of this manuscript.

References

Li Z, Huang Y, Peng D, He M, Jin L (2024) SideNet: learning representations from interactive side information for zero-shot Chinese character recognition. Pattern Recogn 148:110208
Article Google Scholar
Rainarli E (2021) A decade: review of scene text detection methods. Comput Sci Rev 42:100434
Article MathSciNet Google Scholar
Khan T, Sarkar R, Mollah AF (2021) Deep learning approaches to scene text detection: a comprehensive review. Artif Intell Rev 54:3239–3298
Article Google Scholar
Gupta N, Jalal AS (2022) Traditional to transfer learning progression on scene text detection and recognition: a survey. Artif Intell Rev 2022:1–46
Google Scholar
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: Recent advances and future trends. Front Comp Sci 10:19–36
Article Google Scholar
Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54:4317–4377
Article Google Scholar
Zhao J, Wang Y, Xiao B, Shi C, Jia F, Wang C (2020) DetectGAN: GAN-based text detector for camera-captured document images. Int J Doc Anal Recogn 23:267–277
Article Google Scholar
Xu S, Guo C, Zhu Y, Liu G, Xiong N (2023) CNN-VAE: an intelligent text representation algorithm. J Supercomput 2023:1–26
CAS Google Scholar
Chen G, Long S, Yuan Z, Zhu W, Chen Q, Yilin Wu (2022) Ising granularity image analysis on VAE–GAN. Mach Vis Appl 33(6):81
Article Google Scholar
Zhang J, Lang X, Huang B et al (2023) VAE-CoGAN: unpaired image-to-image translation for low-level vision. SIViP 17:1019–1026
Article Google Scholar
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558
Liao M, Zhu Z, Shi B, Xia GS, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918
Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690
Article ADS MathSciNet PubMed Google Scholar
Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W (2018) Fused text segmentation networks for multi-oriented scene text detection. In: 24th international conference on pattern recognition, IEEE, pp 3604–3609
Yang Q, Cheng M, Zhou W, Chen Y, Qiu M, Lin W, Chu W (2018) Inceptext: a new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv preprint arXiv:1805.01167
Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision, pp 67–83
Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32, no 1
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the european conference on computer vision (ECCV), pp 20–36
Jian Q (2020) Scene text detection using context-aware pyramid feature extraction. In: Proceedings of the international conference on computing and data science, pp 226–230
Larbi G (2023) Two-step text detection framework in natural scenes based on Pseudo-Zernike moments and CNN. Multimed Tools Appl 82(7):10595–10616
Article Google Scholar
Alshawi AA, Tanha J, Balafar MA, Imanzadeh S (2023) A hybrid deep-based model for scene text detection and recognition in meter reading. Int J Inf Technol 15(7):3575–3581
Google Scholar
Mahadshetti R, Lee GS, Choi DJ (2023) RMFPN: end-to-end scene text recognition using multi-feature pyramid network. IEEE Access 11:61892–61900
Article Google Scholar
Ueda A, Yang W, Sugiura K (2023) Switching text-based image encoders for captioning images with text. IEEE Access. 11:55706–55715
Article Google Scholar
Dang Q-V, Lee G-S (2023) Scene text segmentation via multi-task cascade transformer with paired data synthesis. IEEE Access 11:67791–67805
Article Google Scholar
Wang X, Wu C, Yu H, Li B, Xue X (2023) Textformer: component-aware text segmentation with transformer. In: Proceedings of the IEEE international conference on multimedia and expo, pp 1877–1882
Ravi V, Acharya V, Pham TD (2022) Attention deep learning-based large-scale learning classifier for Cassava leaf disease classification. Expert Syst 39(2):e12862
Article Google Scholar
Ravi V, Chaganti R (2023) EfficientNet deep learning meta-classifier approach for image-based android malware detection. Multimed Tools Appl 82(16):24891–24917
Article Google Scholar
Xue C, Huang J, Zhang W, Shijian L, Wang C, Bai S (2023) Image-to-character-to-word transformers for accurate scene text recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3230962
Article PubMed Google Scholar
Krishnan P, Kovvuri R, Pang G, Vassilev B, Hassner T (2023) Textstylebrush: transfer of text aesthetics from a single example. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3239736
Article PubMed Google Scholar
Chng C-K, Chan CS, Liu C-L (2020) Total-text: toward orientation robustness in scene text detection. Int J Document Anal Recog (IJDAR) 23(1):31–52
Article Google Scholar
Yuliang L, Lianwen J, Shuaitao Z, Sheng Z (2017) Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, et al. (2015) ICDAR 2015 competition on robust reading. In: 13th international conference on document analysis and recognition, IEEE, pp 1156–1160
Sanchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2017) ICDAR2017 competition on handwritten text recognition on the READ dataset. In: Proceedings of the 14th IAPR international conference on document analysis and recognition, IEEE, vol. 1, pp 1383–13882017
Zhang R, Zhou Y, Jiang Q, Song Q, Li N, Zhou K, Wang L, et al. (2019) ICDAR 2019 robust reading challenge on reading Chinese text on signboard. In: Proceedings of the international conference on document analysis and recognition, p. 1577–1581
Cong Y MSRA Text Detection 500 Database (MSRA-TD500), 1, ID: MSRA-TD500_1, https://tc11.cvc.uab.es/datasets/MSRA-TD500_1
Gomez R, Shi B, Gomez L, Numann L, Veit A, Matas J, Belongie S, Karatzas D (2017) Icdar2017 robust reading challenge on coco-text. In: Proceedings of the 14th IAPR international conference on document analysis and recognition, vol. 1, pp 1435–1443
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
Kai W, The Street View Text Dataset (SVT), 1, ID: SVT_1, https://tc11.cvc.uab.es/datasets/SVT_1
http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database

Download references

Funding

Not Applicable.

Author information

Authors and Affiliations

Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore, India
Palanichamy Naveen
Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, AlKharj, 16278, Saudi Arabia
Mahmoud Hassaballah
Department of Computer Science, Faculty of Computers and Information, South Valley University, Qena, Egypt
Mahmoud Hassaballah

Authors

Palanichamy Naveen
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud Hassaballah
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both the author has equal contribution.

Corresponding author

Correspondence to Palanichamy Naveen.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interest or personal relationship that could has appeared to influence the work reported in this paper.

Ethical approval

Not Applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Naveen, P., Hassaballah, M. Scene text detection using structured information and an end-to-end trainable generative adversarial networks. Pattern Anal Applic 27, 33 (2024). https://doi.org/10.1007/s10044-024-01259-y

Download citation

Received: 21 December 2023
Accepted: 21 February 2024
Published: 19 March 2024
DOI: https://doi.org/10.1007/s10044-024-01259-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene text detection using structured information and an end-to-end trainable generative adversarial networks

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

A survey on Image Data Augmentation for Deep Learning

CBAM: Convolutional Block Attention Module

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scene text detection using structured information and an end-to-end trainable generative adversarial networks

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

A survey on Image Data Augmentation for Deep Learning

CBAM: Convolutional Block Attention Module

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation