Abstract
This work is devoted to improving the quality of segmentation of images of various scientific papers and legal acts by neural network models by training them using modified loss functions that take into account special features of images of the appropriate subject domain. The analysis of existing loss functions is carried out, and new functions are proposed that work both with the coordinates of bounding boxes and use information about the pixels of the input image. To assess the quality, a neural network segmentation model with modified loss functions is trained, and a theoretical assessment is carried out using a simulation experiment showing the convergence rate and segmentation error. As a result of the study, rapidly converging loss functions are created that improve the quality of document image segmentation using additional information about the input data.
Similar content being viewed by others
REFERENCES
Zheng, Z., Wang, P., et al.. Distance-IoU loss: Faster and better learning for bounding box regression, Proc. of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 07, pp. 12993–13000.
Rezatofighi, H., Tsoi, N., et al.. Generalized intersection over union: A metric and a loss for bounding box regression, Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 658–666.
Zheng, T., Zhao, S., et al.. SCALoss: Side and corner aligned loss for bounding box regression. arXiv preprint arXiv:2104.00462, 2021.
He, J., Erfani, S., et al.. α-IoU: A family of power intersection over union losses for bounding box regression, Adv. Neural Inf. Process. Syst., 2021, vol. 34.
Wu, S., Yang, J., et al., IoU-balanced loss functions for single-stage object detection, Pattern Recognit. Lett., 2022, vol. 156, pp. 96–103.
Du, S., Zhang, B., and Zhang, P., Scale-sensitive IOU loss: An improved regression loss function in remote sensing object detection, IEEE Access, 2021, vol. 9, pp. 141258–141272.
Redmon, J. and Farhadi, A., YOLOv3: An incremental improvement. arXiv:1804.02767, 2018.
Zhong, X., Tang, J., and Yepes, A.J., Publaynet: Largest dataset ever for document layout analysis, Proc. of the 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 1015–1022.
Belyaeva, O.V., Perminov, A.I., and Kozlov, I.S., Synthetic data usage for fine-tuning document segmentation models, Trudy ISP RAN, 2020, vol. 32, no. 4, pp. 189–202.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Translated by A. Klimontovich
Rights and permissions
About this article
Cite this article
Perminov, A.I., Turdakov, D.Y. & Belyaeva, O.V. Loss Function for Training Models of Segmentation of Document Images. Program Comput Soft 49, 574–589 (2023). https://doi.org/10.1134/S0361768823070058
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768823070058