GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework

Dong, Chengang; Tang, Yuhao; Zhang, Liyan

doi:10.1007/s10586-024-04373-y

GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework

Published: 23 March 2024

(2024)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Chengang Dong¹,
Yuhao Tang¹ &
Liyan Zhang¹

100 Accesses
Explore all metrics

Abstract

The real-time instance segmentation task based on deep learning aims to accurately identify and distinguish all instance objects from images or videos. However, due to the existence of problems such as mutual occlusion between instances, limitations in model receptive fields, etc., achieving accurate and real-time segmentation continues to pose a formidable challenge. To alleviate the aforementioned issues, this paper proposes a real-time instance segmentation method based on a dual-branch structure, called GHA-Inst. Specifically, we made improvements to the feature fusion module (Neck) and output end (Head) of the YOLOv7-seg real-time instance segmentation framework to mitigate the accuracy reduction caused by feature loss and reduce the interference of background noise on the model. Secondly, we introduced a Global Hybrid-Domain Attention (GHA) module to improve the model’s focus on significant information while retaining more original spatial features, alleviate incomplete segmentation caused by instance occlusion, and improve the quality of generated masks. Finally, our method achieved competitive results on multiple metrics of the MS COCO 2017 and KINS open-source datasets. Compared with the YOLOv7-seg baseline model, GHA-Inst improved the average precision (AP) by 3.4% and 2.6% on the two datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

Article 19 January 2024

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation

Article 08 August 2022

Data availability

As our study did not involve the generation or analysis of datasets, the sharing of data is not applicable to this article. We did not gather any specific datasets that would necessitate sharing with other researchers or the general public. Consequently, there are no datasets associated with our investigation that would be accessible for the purpose of data sharing.

References

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)
Chen, L.-C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: Masklab: Instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 4013–4022 (2018)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, 9157–9166 (2019)
Yang, G., Li, R., Zhang, S., Wen, Y., Xu, X., Song, H.: Extracting cow point clouds from multi-view rgb images with an improved yolact++ instance segmentation. Expert Syst. Appl. 230, 120730 (2023)
Article Google Scholar
Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9799–9808 (2020)
Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., Han, X., Chen, Y.-W., Wu, J.: Unet 3+: A full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 1055–1059 (2020). IEEE
Han, D., Yun, S., Heo, B., Yoo, Y.: Rexnet: Diminishing representational bottleneck on convolutional neural network. arXiv preprint arXiv:2007.009926, 1 (2020)
Koonce, B., Koonce, B.: Mobilenetv3. Convolutional neural networks with swift for Tensorflow: image recognition and dataset categorization. 125–144 (2021)
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 13733–13742 (2021)
Huang, Z., Wang, J., Fu, X., Yu, T., Guo, Y., Wang, R.: Dc-spp-yolo: dense connection and spatial pyramid pooling based yolo for object detection. Inf. Sci. 522, 241–258 (2020)
Article MathSciNet Google Scholar
Wang, H., Jin, Y., Ke, H., Zhang, X.: Ddh-yolov5: improved yolov5 based on double iou-aware decoupled head for object detection. J. Real-Time Image Proc. 19(6), 1023–1033 (2022)
Article Google Scholar
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7464–7475 (2023)
Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8533–8542 (2020)
Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8573–8581 (2020)
Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural. Inf. Process. Syst. 33, 17721–17732 (2020)
Google Scholar
He, J., Li, P., Geng, Y., Xie, X.: Fastinst: A simple query-based model for real-time instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 23663–23672 (2023)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Aboah, A., Wang, B., Bagci, U., Adu-Gyamfi, Y.: Real-time multi-class helmet violation detection using few-shot data sampling technique and yolov8. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5349–5357 (2023)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Yasir, M., Zhan, L., Liu, S., Wan, J., Hossain, M.S., Isiacik Colak, A.T., Liu, M., Islam, Q.U., Raza Mehdi, S., Yang, Q.: Instance segmentation ship detection based on improved yolov7 using complex background sar images. Front. Mar. Sci. 10, 1113669 (2023)
Article Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 8759–8768 (2018)
Lu, C., Xia, Z., Przystupa, K., Kochan, O., Su, J.: Dcelanm-net: medical image segmentation based on dual channel efficient layer aggregation network with learner. arXiv preprint arXiv:2304.09620 (2023)
Chowdhury, P.N., Sain, A., Bhunia, A.K., Xiang, T., Gryaditskaya, Y., Song, Y.-Z.: Fs-coco: Towards understanding of freehand sketches of common objects in context. In: European conference on computer vision, 253–270 (2022). Springer
Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with kins dataset. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 3014–3023 (2019)
Lv, W., Xu, S., Zhao, Y., Wang, G., Wei, J., Cui, C., Du, Y., Dang, Q., Liu, Y.: Detrs beat yolos on real-time object detection. arXiv preprint arXiv:2304.08069 (2023)
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., : Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4974–4983 (2019)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International conference on computer vision. 9627–9636 (2019)
Li, R., He, C., Li, S., Zhang, Y., Zhang, L.: Dynamask: Dynamic mask selection for instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11279–11288 (2023)
Fang, Y., Yang, S., Wang, X., Li, Y., Fang, C., Shan, Y., Feng, B., Liu, W.: Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision, 6910–6919 (2021)
Zhang, T., Wei, S., Ji, S.: E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4443–4452 (2022)
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1290–1299 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), 3–19 (2018)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 13713–13722 (2021)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, 0–0 (2019)
Ghiasi, G., Lin, T.-Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 7036–7045 (2019)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 10781–10790 (2020)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11534–11542 (2020)
Li, Q., Li, D., Zhao, K., Wang, L., Wang, K.: State of health estimation of lithium-ion battery based on improved ant lion optimization and support vector regression. J. Energy Storage 50, 104215 (2022)
Article Google Scholar
Lee, Y., Park, J.: Centermask: Real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 13906–13915 (2020)
Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, 282–298 (2020). Springer
Zhu, X., Lyu, S., Wang, X., Zhao, Q.: Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF international conference on computer vision, 2778–2788 (2021)
Zhao, H., Zhang, H., Zhao, Y.: Yolov7-sea: Object detection of maritime uav images based on improved yolov7. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 233–238 (2023)
Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2359–2367 (2017)
Ke, L., Tai, Y.-W., Tang, C.-K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4019–4028 (2021)
Zeng, X., Liu, X., Yin, J.: Amodal segmentation just like doing a jigsaw. Appl. Sci. 12(8), 4061 (2022)
Article Google Scholar
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE winter conference on applications of computer vision (WACV), 839–847 (2018). IEEE

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62172212, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20230031.

Author information

Authors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, China
Chengang Dong, Yuhao Tang & Liyan Zhang

Authors

Chengang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yuhao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Liyan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liyan Zhang.

Ethics declarations

Conflict of interest

We have conducted a thorough assessment of both financial and non-financial affiliations that could potentially create a Conflict of interest with the research presented. We unequivocally declare that no Conflict of interest have been identified that could in any way introduce bias or influence the outcomes of our study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dong, C., Tang, Y. & Zhang, L. GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04373-y

Download citation

Received: 07 December 2023
Revised: 02 February 2024
Accepted: 21 February 2024
Published: 23 March 2024
DOI: https://doi.org/10.1007/s10586-024-04373-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework

Abstract

Access this article

Similar content being viewed by others

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework

Abstract

Access this article

Similar content being viewed by others

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation