Scene text detection using structured information and an end-to-end trainable generative adversarial networks,Pattern Analysis and Applications

当前位置： X-MOL 学术 › Pattern Anal. Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scene text detection using structured information and an end-to-end trainable generative adversarial networks
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2024-03-19 , DOI: 10.1007/s10044-024-01259-y
Palanichamy Naveen , Mahmoud Hassaballah

Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.

中文翻译：

使用结构化信息和端到端可训练生成对抗网络进行场景文本检测

由于文本外观、背景和方向的多样性，场景文本检测提出了相当大的挑战。在这种情况下，提高稳健性、准确性和效率对于光学字符识别、图像理解和自动驾驶车辆等多种应用至关重要。本文探讨了生成对抗网络（GAN）和网络变分自动编码器（VAE）的集成，以创建一个强大而有效的文本检测网络。所提出的架构包含三个互连的模块：VAE 模块、GAN 模块和文本检测模块。在此框架中，VAE 模块在生成多样化且可变的文本区域方面发挥着关键作用。随后，GAN 模块对这些区域进行细化和增强，确保更高的真实性和准确性。然后，文本检测模块负责通过为每个区域分配置信度分数来识别输入图像中的文本区域。整个网络的综合训练涉及最小化联合损失函数，其中包括 VAE 损失、GAN 损失和文本检测损失。VAE损失保证了生成文本区域的多样性，GAN损失保证了真实性和准确性，而文本检测损失则确保了文本区域的高精度识别。该方法在 VAE 模块中采用编码器-解码器结构，在 GAN 模块中采用生成器-鉴别器结构。对 Total-Text、CTW1500、ICDAR 2015、ICDAR 2017、ReCTS、TD500、COCO-Text、SynthText、Street View Text 和 KIAST Scene Text 等不同数据集的严格测试表明，与现有方法相比，所提出的方法具有优越的性能。

更新日期：2024-03-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>