Abstract
In the realm of real-time semantic segmentation, deep neural networks have demonstrated promising potential. However, current methods face challenges when it comes to accurately segmenting object boundaries and small objects. This limitation is partly attributed to the prevalence of convolutional neural networks, which often involve multiple sequential down-sampling operations, resulting in the loss of fine-grained details. To overcome this drawback, we introduce BENet, a real-time semantic segmentation network with a focus on enhancing object boundaries. The proposed BENet integrates two key components: the boundary extraction module (BEM) and the boundary adaption layer (BAL). The proposed BEM efficiently extracts boundary information, while the BAL guides the network using this information to preserve intricate details during the feature extraction process. Furthermore, to address the challenges associated with poor segmentation of elongated objects, we introduce the strip mixed aggregation pyramid pooling module (SMAPPM). This module employs strip pooling kernels to effectively expand the contextual representation and receptive field of the network, thereby enhancing overall segmentation performance. Our experiments conducted on a single RTX 3090 GPU show that our method achieves an mIoU of 79.4% at a speed of 45.5 FPS on the Cityscapes test set without ImageNet pre-training.
Similar content being viewed by others
Data availability
The data supporting the reported results for the Cityscape dataset can be accessed through the following link: https://www.cityscapes-dataset.com/. This dataset is publicly available for research purposes and can be downloaded upon registration on the website. Similarly, for the CamVid dataset, the data supporting the reported results is available at the following link: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/. Like the Cityscape dataset, the CamVid dataset is also publicly available for research purposes, and access to it can be obtained by registering on the website.
References
Peng, J., Liu, Y., Tang, S., Hao, Y., Chu, L., Chen, G., Wu, Z., Chen, Z., Yu, Z., Du, Y., et al.: Pp-liteseg: a superior real-time semantic segmentation model. arXiv:2204.02681 (2022)
Gao, R.: Rethinking dilated convolution for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4674–4683 (2023)
Poudel, R.P., Bonde, U., Liwicki, S., Zach, C.: ContextNet: exploring context and detail for semantic segmentation in real-time. arXiv:1805.04554 (2018)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiseNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 325–341 (2018)
Hong, Y., Pan, H., Sun, W., Jia, Y.: Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv:2101.06085 (2021)
Yan, M., Lou, X., Chan, C.A., Wang, Y., Jiang, W.: A semantic and emotion-based dual latent variable generation model for a dialogue system. CAAI Trans. Intell. Technol. 8(2), 319–330 (2023)
Xu, J., Xiong, Z., Bhattacharyya, S.P.: PidNet: a real-time semantic segmentation network inspired by PID controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19529–19539 (2023)
Kanopoulos, N., Vasanthavada, N., Baker, R.L.: Design of an image edge detection filter using the Sobel operator. IEEE J. Solid-State Circuits 23(2), 358–367 (1988)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Kanopoulos, N., Vasanthavada, N., Baker, R.L.: Design of an image edge detection filter using the Sobel operator. IEEE J. Solid-State Circuits 23(2), 358–367 (1988)
Lin, Y., Zhang, D., Fang, X., Chen, Y., Cheng, K.-T., Chen, H.: Rethinking boundary detection in deep learning models for medical image segmentation. In: International Conference on Information Processing in Medical Imaging, pp. 730–742 (2023)
Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi-view multi-scale supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14185–14193 (2021)
Fan, D.-P., Ji, G.-P., Sun, G., Cheng, M.-M., Shen, J., Shao, L.: Camouflaged object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2777–2787 (2020)
Lin, Y., Qu, Z., Chen, H., Gao, Z., Li, Y., Xia, L., Ma, K., Zheng, Y., Cheng, K.-T.: Label propagation for annotation-efficient nuclei segmentation from pathology images. arXiv:2202.08195 (2022)
Yan, M., Xiong, R., Shen, Y., Jin, C., Wang, Y.: Intelligent generation of Peking opera facial masks with deep learning frameworks. Herit. Sci. 11(1), 20 (2023)
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
Qi, Y., He, Y., Qi, X., Zhang, Y., Yang, G.: Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6070–6079 (2023)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122 (2015)
Dou, W., Gao, S., Mao, D., Dai, H., Zhang, C., Zhou, Y.: Tooth instance segmentation based on capturing dependencies and receptive field adjustment in cone beam computed tomography. Comput. Animat. Virtual Worlds 33(5), e2100 (2022). https://doi.org/10.1002/CAV.2100
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7519–7528 (2019)
Nirkin, Y., Wolf, L., Hassner, T.: HyperSeg: patch-wise hypernetwork for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4061–4070 (2021)
Lin, D., Shen, D., Shen, S., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: ZigzagNet: fusing top-down and bottom-up context for object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7490–7499 (2019)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612 (2019)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062 (2014)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 (2016)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ErfNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Zha, H., Liu, R., Yang, X., Zhou, D., Zhang, Q., Wei, X.: AsfNet: adaptive multiscale segmentation fusion network for real-time semantic segmentation. Comput. Anim. Virtual Worlds 32(3–4), 2022 (2021)
Poudel, R.P., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. arXiv:1902.04502 (2019)
Zhang, Y., Yao, T., Qiu, Z., Mei, T.: Lightweight and progressively-scalable networks for semantic segmentation. Int. J. Comput. Vision 131(8), 2153–2171 (2023)
Li, X., Li, X., Zhang, L., Cheng, G., Shi, J., Lin, Z., Tan, S., Tong, Y.: Improving semantic segmentation via decoupled body and edge supervision. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK , August 23–28, 2020, Proceedings, Part XVII 16, Springer, pp. 435–452 (2020)
Zhu, H., Li, P., Xie, H., Yan, X., Liang, D., Chen, D., Wei, M., Qin, J.: I can find you! boundary-guided separated attention network for camouflaged object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 3608–3616 (2022)
Liang, D., Du, Y., Sun, H., Zhang, L., Liu, N., Wei, M.: Nlkd: using coarse annotations for semantic segmentation based on knowledge distillation. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 2335–2339 (2021)
Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5229–5238 (2019)
Liang, D., Li, L., Wei, M., Yang, S., Zhang, L., Yang, W., Du, Y., Zhou, H.: Semantically contrastive learning for low-light image enhancement. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 1555–1563 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., Tong, Y.: Semantic flow for fast and accurate scene parsing. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK , August 23–28, 2020, Proceedings, Part I 16, pp. 775–793 (2020)
Liang, D., Kang, B., Liu, X., Gao, P., Tan, X., Kaneko, S.: Cross-scene foreground segmentation with supervised and unsupervised model communication. Pattern Recogn. 117, 107995 (2021)
Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang, M.-H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2019
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016))
Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)
Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking BiseNet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9716–9725 (2021)
Lin, P., Sun, P., Cheng, G., Xie, S., Li, X., Shi, J.: Graph-guided architecture search for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2020)
Zhang, Y., Qiu, Z., Liu, J., Yao, T., Liu, D., Mei, T.: Customizable architecture search for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11641–11650 (2019)
Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: BiseNet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vision 129, 3051–3068 (2021)
Si, H., Zhang, Z., Lv, F., Yu, G., Lu, F.: Real-time semantic segmentation via multiply spatial fusion network. arXiv:1911.07217 (2019)
Wang, J., Gou, C., Wu, Q., Feng, H., Han, J., Ding, E., Wang, J.: RTformer: efficient design for real-time semantic segmentation with transformer. Adv. Neural. Inf. Process. Syst. 35, 7423–7436 (2022)
Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., Tong, Y.: Semantic flow for fast and accurate scene parsing. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK , August 23–28, 2020, Proceedings, Part I 16, pp. 775–793 (2020)
Funding
This work was supported by the National Natural Science Foundation of China (62172118) and Nature Science key Foundation of Guangxi (2021GXNSFDA196002); in part by the Guangxi Key Laboratory of Image and Graphic Intelligent Processing under Grants (GIIP2305) and Student’s Platform for Innovation and Entrepreneurship Training Program under Grant (S202310595258, 202310595026).
Author information
Authors and Affiliations
Contributions
Conceptualization, X.L. and Z.C.; methodology, X.L. and Z.C.; software, Z.C. and Z.Y.; validation Z.C.; formal analysis, Z.C.; investigation, X.L. and Z.C.; resources, X.L. and Z.J.; data curation, Z.C. and Z.Y.; writing–original draft preparation, Z.C.; writing–review and editing, Z.C., X.L., and Z.J.; visualization, Z.C. and Z.Y.; supervision, Z.J.; project administration, Z.J.; funding acquisition, Z.J. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lei, X., Chen, Z., Yu, Z. et al. BENet: boundary-enhanced network for real-time semantic segmentation. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03320-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03320-7