BENet: boundary-enhanced network for real-time semantic segmentation

Lei, Xiaochun; Chen, Zeyu; Yu, Zhaoxin; Jiang, Zetao

doi:10.1007/s00371-024-03320-7

BENet: boundary-enhanced network for real-time semantic segmentation

Research
Published: 27 March 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

90 Accesses
1 Altmetric
Explore all metrics

Abstract

In the realm of real-time semantic segmentation, deep neural networks have demonstrated promising potential. However, current methods face challenges when it comes to accurately segmenting object boundaries and small objects. This limitation is partly attributed to the prevalence of convolutional neural networks, which often involve multiple sequential down-sampling operations, resulting in the loss of fine-grained details. To overcome this drawback, we introduce BENet, a real-time semantic segmentation network with a focus on enhancing object boundaries. The proposed BENet integrates two key components: the boundary extraction module (BEM) and the boundary adaption layer (BAL). The proposed BEM efficiently extracts boundary information, while the BAL guides the network using this information to preserve intricate details during the feature extraction process. Furthermore, to address the challenges associated with poor segmentation of elongated objects, we introduce the strip mixed aggregation pyramid pooling module (SMAPPM). This module employs strip pooling kernels to effectively expand the contextual representation and receptive field of the network, thereby enhancing overall segmentation performance. Our experiments conducted on a single RTX 3090 GPU show that our method achieves an mIoU of 79.4% at a speed of 45.5 FPS on the Cityscapes test set without ImageNet pre-training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Article 27 November 2023

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

Article 24 January 2024

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation

Data availability

The data supporting the reported results for the Cityscape dataset can be accessed through the following link: https://www.cityscapes-dataset.com/. This dataset is publicly available for research purposes and can be downloaded upon registration on the website. Similarly, for the CamVid dataset, the data supporting the reported results is available at the following link: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/. Like the Cityscape dataset, the CamVid dataset is also publicly available for research purposes, and access to it can be obtained by registering on the website.

References

Peng, J., Liu, Y., Tang, S., Hao, Y., Chu, L., Chen, G., Wu, Z., Chen, Z., Yu, Z., Du, Y., et al.: Pp-liteseg: a superior real-time semantic segmentation model. arXiv:2204.02681 (2022)
Gao, R.: Rethinking dilated convolution for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4674–4683 (2023)
Poudel, R.P., Bonde, U., Liwicki, S., Zach, C.: ContextNet: exploring context and detail for semantic segmentation in real-time. arXiv:1805.04554 (2018)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiseNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 325–341 (2018)
Hong, Y., Pan, H., Sun, W., Jia, Y.: Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv:2101.06085 (2021)
Yan, M., Lou, X., Chan, C.A., Wang, Y., Jiang, W.: A semantic and emotion-based dual latent variable generation model for a dialogue system. CAAI Trans. Intell. Technol. 8(2), 319–330 (2023)
Article Google Scholar
Xu, J., Xiong, Z., Bhattacharyya, S.P.: PidNet: a real-time semantic segmentation network inspired by PID controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19529–19539 (2023)
Kanopoulos, N., Vasanthavada, N., Baker, R.L.: Design of an image edge detection filter using the Sobel operator. IEEE J. Solid-State Circuits 23(2), 358–367 (1988)
Article Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Kanopoulos, N., Vasanthavada, N., Baker, R.L.: Design of an image edge detection filter using the Sobel operator. IEEE J. Solid-State Circuits 23(2), 358–367 (1988)
Article Google Scholar
Lin, Y., Zhang, D., Fang, X., Chen, Y., Cheng, K.-T., Chen, H.: Rethinking boundary detection in deep learning models for medical image segmentation. In: International Conference on Information Processing in Medical Imaging, pp. 730–742 (2023)
Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi-view multi-scale supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14185–14193 (2021)
Fan, D.-P., Ji, G.-P., Sun, G., Cheng, M.-M., Shen, J., Shao, L.: Camouflaged object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2777–2787 (2020)
Lin, Y., Qu, Z., Chen, H., Gao, Z., Li, Y., Xia, L., Ma, K., Zheng, Y., Cheng, K.-T.: Label propagation for annotation-efficient nuclei segmentation from pathology images. arXiv:2202.08195 (2022)
Yan, M., Xiong, R., Shen, Y., Jin, C., Wang, Y.: Intelligent generation of Peking opera facial masks with deep learning frameworks. Herit. Sci. 11(1), 20 (2023)
Article Google Scholar
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
Qi, Y., He, Y., Qi, X., Zhang, Y., Yang, G.: Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6070–6079 (2023)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122 (2015)
Dou, W., Gao, S., Mao, D., Dai, H., Zhang, C., Zhou, Y.: Tooth instance segmentation based on capturing dependencies and receptive field adjustment in cone beam computed tomography. Comput. Animat. Virtual Worlds 33(5), e2100 (2022). https://doi.org/10.1002/CAV.2100
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7519–7528 (2019)
Nirkin, Y., Wolf, L., Hassner, T.: HyperSeg: patch-wise hypernetwork for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4061–4070 (2021)
Lin, D., Shen, D., Shen, S., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: ZigzagNet: fusing top-down and bottom-up context for object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7490–7499 (2019)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612 (2019)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062 (2014)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Article Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ErfNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Article Google Scholar
Zha, H., Liu, R., Yang, X., Zhou, D., Zhang, Q., Wei, X.: AsfNet: adaptive multiscale segmentation fusion network for real-time semantic segmentation. Comput. Anim. Virtual Worlds 32(3–4), 2022 (2021)
Article Google Scholar
Poudel, R.P., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. arXiv:1902.04502 (2019)
Zhang, Y., Yao, T., Qiu, Z., Mei, T.: Lightweight and progressively-scalable networks for semantic segmentation. Int. J. Comput. Vision 131(8), 2153–2171 (2023)
Article Google Scholar
Li, X., Li, X., Zhang, L., Cheng, G., Shi, J., Lin, Z., Tan, S., Tong, Y.: Improving semantic segmentation via decoupled body and edge supervision. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK , August 23–28, 2020, Proceedings, Part XVII 16, Springer, pp. 435–452 (2020)
Zhu, H., Li, P., Xie, H., Yan, X., Liang, D., Chen, D., Wei, M., Qin, J.: I can find you! boundary-guided separated attention network for camouflaged object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 3608–3616 (2022)
Liang, D., Du, Y., Sun, H., Zhang, L., Liu, N., Wei, M.: Nlkd: using coarse annotations for semantic segmentation based on knowledge distillation. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 2335–2339 (2021)
Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5229–5238 (2019)
Liang, D., Li, L., Wei, M., Yang, S., Zhang, L., Yang, W., Du, Y., Zhou, H.: Semantically contrastive learning for low-light image enhancement. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 1555–1563 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., Tong, Y.: Semantic flow for fast and accurate scene parsing. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK , August 23–28, 2020, Proceedings, Part I 16, pp. 775–793 (2020)
Liang, D., Kang, B., Liu, X., Gao, P., Tan, X., Kaneko, S.: Cross-scene foreground segmentation with supervised and unsupervised model communication. Pattern Recogn. 117, 107995 (2021)
Article Google Scholar
Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang, M.-H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016))
Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)
Article Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)
Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking BiseNet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9716–9725 (2021)
Lin, P., Sun, P., Cheng, G., Xie, S., Li, X., Shi, J.: Graph-guided architecture search for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2020)
Zhang, Y., Qiu, Z., Liu, J., Yao, T., Liu, D., Mei, T.: Customizable architecture search for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11641–11650 (2019)
Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: BiseNet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vision 129, 3051–3068 (2021)
Article Google Scholar
Si, H., Zhang, Z., Lv, F., Yu, G., Lu, F.: Real-time semantic segmentation via multiply spatial fusion network. arXiv:1911.07217 (2019)
Wang, J., Gou, C., Wu, Q., Feng, H., Han, J., Ding, E., Wang, J.: RTformer: efficient design for real-time semantic segmentation with transformer. Adv. Neural. Inf. Process. Syst. 35, 7423–7436 (2022)
Google Scholar
Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., Tong, Y.: Semantic flow for fast and accurate scene parsing. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK , August 23–28, 2020, Proceedings, Part I 16, pp. 775–793 (2020)

Download references

Funding

This work was supported by the National Natural Science Foundation of China (62172118) and Nature Science key Foundation of Guangxi (2021GXNSFDA196002); in part by the Guangxi Key Laboratory of Image and Graphic Intelligent Processing under Grants (GIIP2305) and Student’s Platform for Innovation and Entrepreneurship Training Program under Grant (S202310595258, 202310595026).

Author information

Authors and Affiliations

School of Computer Science and Information Security Guilin University of Electronic Technology, Guilin, 541010, Guangxi, China
Xiaochun Lei, Zeyu Chen, Zhaoxin Yu & Zetao Jiang
Guangxi Key Laboratory of Image and Graphic Intelligent Processing, Guilin University of Electronic Technology, Guilin, 541004, Guangxi, China
Xiaochun Lei & Zetao Jiang

Authors

Xiaochun Lei
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zetao Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, X.L. and Z.C.; methodology, X.L. and Z.C.; software, Z.C. and Z.Y.; validation Z.C.; formal analysis, Z.C.; investigation, X.L. and Z.C.; resources, X.L. and Z.J.; data curation, Z.C. and Z.Y.; writing–original draft preparation, Z.C.; writing–review and editing, Z.C., X.L., and Z.J.; visualization, Z.C. and Z.Y.; supervision, Z.J.; project administration, Z.J.; funding acquisition, Z.J. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Zetao Jiang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lei, X., Chen, Z., Yu, Z. et al. BENet: boundary-enhanced network for real-time semantic segmentation. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03320-7

Download citation

Accepted: 18 February 2024
Published: 27 March 2024
DOI: https://doi.org/10.1007/s00371-024-03320-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BENet: boundary-enhanced network for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BENet: boundary-enhanced network for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation