Abstract
Billions of documents in data sheet format are shared between various organizations across the globe on a daily basis. The essential information in these documents is presented in tabular format. Extracting and assimilating this information can help organizations make data-driven decisions. Solutions for detecting tables in document images have been well explored. Thus, in this work, we propose TableStrRec, a deep learning-based approach to recognize the structure of such detected tables by detecting rows and columns. TableStrRec comprises two Cascade R-CNN architectures, each with a deformable backbone and Complete IOU loss to improve their detection performance. One architecture detects and classifies rows as regular rows (rows without a merged cell) and irregular rows (groups of regular rows that share a merged cell). The second architecture detects and classifies columns as regular columns (columns without a merged cell) and irregular columns (groups of regular columns that share a merged cell). Both architectures work in parallel to provide the results in a single inference. We show that utilizing TableStrRec to detect four classes of objects improves the table structure recognition performance on three public test sets. We achieve \(90.5\%\) and \(89.6\%\) weighted average F1 scores on the ICDAR2013 test set for rows and columns, respectively. On the TabStructDB test set, we achieve \(72.7\%\) and \(78.5\%\) weighted average F1 score for rows and columns, respectively. We also evaluate the proposed method under the FinTabNet dataset using the structure-only TEDS score, achieving 98.34%, which can outperform most state-of-the-art benchmark models.
Similar content being viewed by others
References
Göbel, M., Hassan, T., Oro, E., Orsi, G.: Icdar 2013 table competition. In: 12th International Conference on Document Analysis and Recognition, pp. 1449–1453 (2013)
Brynjolfsson, E., McElheran, K.: Data in action: data-driven decision making and predictive analytics in U.S. manufacturing. Entrepreneurship & Economics eJournal (2019)
Siddiqui, S.A., Malik, M.I., Agne, S., Dengel, A., Ahmed, S.: Decnt: deep deformable cnn for table detection. IEEE Access 6, 74151–74161 (2018)
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Kara, E., Traquair, M., Simsek, M., Kantarci, B., Khan, S.: Holistic design for deep learning-based discovery of tabular structures in datasheet images. Eng. Appl. Artif. Intell. 90, 103–551 (2020)
Fernandes, J., Simsek, M., Kantarci, B., Khan, S.: Tabledet: an end-to-end deep learning approach for table detection and table image classification in data sheet images. Neurocomputing 468, 317–334 (2022)
Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: Icdar2017 competition on page object detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 01, pp. 1417–1422 (2017)
Gao, L., Huang, Y., Déjean, H., Meunier, J.L., Yan, Q., Fang, Y., Kleber, F., Lang, E.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019). https://doi.org/10.1109/ICDAR.2019.00243
Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1403–1409 (2019)
Hashmi, K.A., Stricker, D., Liwicki, M., Afzal, M.N., Afzal, M.Z.: Guided table structure recognition through anchor optimization. IEEE Access 9, 113,521-113,534 (2021)
Jiang, J., Simsek, M., Kantarci, B., Khan, S.: Tabcellnet: deep learning-based tabular cell structure detection. Neurocomputing 440, 12–23 (2021)
Chi, Z., Huang, H., Xu, H., Yu, H., Yin, W., Mao, X.: Complicated table structure recognition. CoRR arXiv:1908.04729 (2019)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: IEEE International Conference on Computer Vision (ICCV) pp. 764–773 (2017)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: Faster and better learning for bounding box regression. In: AAAI (2020)
Zheng, X., Burdick, D., Popa, L., Zhong, P., Wang, N.X.R.: Global table extractor (gte): a framework for joint table identification and cell structure recognition using visual context. In: Winter Conference for Applications in Computer Vision (WACV) (2021)
Zanibbi, R., Blostein, D., Cordy, J.: A survey of table recognition. IJDAR 7, 1–16 (2004). https://doi.org/10.1007/s10032-004-0120-9
Liu, Y., Bai, K., Mitra, P., Giles, C.L.: Tableseer: automatic table metadata extraction and searching in digital libraries. In: In Technical Report, pp. 91–100 (2007)
Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B., Ji, R.: Show, read and reason: table structure recognition with flexible context aggregator. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1084–1092 (2021)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask r-cnn. IEEE International Conference on Computer Vision pp. 2980–2988 (2017)
Raja, S., Mondal, A., Jawahar, C.: Table structure recognition using top-down and bottom-up cues. In: European Conference on Computer Vision, Springer, pp. 70–86 (2020)
Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Neural collaborative graph machines for table structure recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4533–4542 (2022)
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)
Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1295–1304 (2021)
Xiao, B., Simsek, M., Kantarci, B., Alkheir, A.A.: Table structure recognition with conditional attention. arXiv preprint arXiv:2203.03819 (2022)
Raja, S., Mondal, A., Jawahar, C.: Visual understanding of complex table structures from document images. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2299–2308 (2022)
Ichikawa, K.: Image-based relation classification approach for table structure recognition. In: International Conference on Document Analysis and Recognition, Springer, pp. 632–647 (2021)
Long, R., Wang, W., Xue, N., Gao, F., Yang, Z., Wang, Y., Xia, G.S.: Parsing table structures in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 944–952 (2021)
Smock, B., Pesala, R., Abraham, R.: Pubtables-1m: Towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4634–4642 (2022)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 764–773 (2017)
Qiao, L., Li, Z., Cheng, Z., Zhang, P., Pu, S., Niu, Y., Ren, W., Tan, W., Wu, F.: Lgpma: complicated table structure recognition with local and global pyramid mask alignment. In: International Conference on Document Analysis and Recognition, Springer, pp. 99–114 (2021)
Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: an accurate table structure recognizer. Pattern Recognit. 126, 108–565 (2022)
Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR) (IEEE), pp. 114–121 (2019)
Zhang, J., Elhoseiny, M., Cohen, S., Chang, W., Elgammal, A.: Relationship proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5678–5686 (2017)
Lin, W., Sun, Z., Ma, C., Li, M., Wang, J., Sun, L., Huo, Q.: Tsrformer: table structure recognition with transformers. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6473–6482 (2022)
Ma, C., Lin, W., Sun, L., Huo, Q.: Robust table detection and structure recognition from heterogeneous document images. Pattern Recognit. 133, 109,006 (2023)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, (2017)
He, Y., Qi, X., Ye, J., Gao, P., Chen, Y., Li, B., Tang, X., Xiao, R.: Pingan-vcgroup’s solution for icdar 2021 competition on scientific table image recognition to latex. arXiv preprint arXiv:2105.01846 (2021)
Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: table structure understanding with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4614–4623 (2022)
Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: European Conference on Computer Vision, Springer, pp. 564–580 (2020)
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 01, pp. 1162–1167 (2017)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 6154–6162 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR arXiv:1512.03385 (2015)
Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. CoRR arXiv:1405.0312 (2014)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.S.: Unitbox: an advanced object detection network. CoRR arXiv:1608.01471 (2016)
Paliwal, S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR) pp. 128–133 (2019)
Smock, B., Pesala, R.: Table Transformer. https://github.com/microsoft/table-transformer (2021)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Wu, Y., He, K.: Group normalization. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (PMLR), pp. 448–456 (2015)
Ye, J., Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Xiao, R.: Pingan-vcgroup’s solution for icdar 2021 competition on scientific literature parsing task b: table recognition to html. arXiv preprint arXiv:2105.01848 (2021)
He, Y., Qi, X., Ye, J., Gao, P., Chen, Y., Li, B., Tang, X., Xiao, R.: TableMASTER-mmocr https://github.com/JiaquanYe/TableMASTER-mmocr (2021)
Hurst, M.: A constraint-based approach to table structure derivation. In: Seventh International Conference on Document Analysis and Recognition, 2003. vol. 3, IEEE Computer Society, pp. 911–911 (2003)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fernandes, J., Xiao, B., Simsek, M. et al. TableStrRec: framework for table structure recognition in data sheet images. IJDAR (2023). https://doi.org/10.1007/s10032-023-00453-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10032-023-00453-8