Skip to main content
Log in

A transformer-based UAV instance segmentation model TF-YOLOv7

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In a dense target scenario in a real city, how to efficiently achieve the labeling of different targets and overcome the problem of mutual occlusion caused by the dense targets in the process becomes the key point of our UAV instance segmentation, therefore, to address the problem of mutual occlusion of targets in UAV instance segmentation, this paper proposes a model for UAV instance segmentation TF-YOLOv7. This model introduced Swin Transformer structure in the backbone network to construct a hierarchical feature map by fusing deep network feature blocks, which is well suited for the dense recognition task of instance segmentation. In addition, the Bottleneck Transformer structure was introduced in the detection stage to recognize the abstract information of the underlying features using convolution, and the higher-level information obtained through the convolution layer is processed using the self-attention mechanism, which could effectively handle large resolution images. Finally, the Focal-EioU loss function was introduced to further optimize the masking performance in mutually occluded small targets for the masking problem in occluded target segmentation and improve the segmentation effect on occluded targets. Through experimental validation on the UAV aerial photography dataset VisDroneDET, our proposed model has a 2.2% performance improvement compared with the benchmark model YOLOv7, proving that the model is suitable for UAV instance segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are openly available in [VisDrone/VisDrone-Dataset] at https://github.com/VisDrone/VisDrone-Dataset. The original image data come from here, and then, we combined the Label Studio tool and SAM model to generate segmentation labels (masks). We applied the VisDrone dataset with segmentation labels to this experiment

References

  1. Huang, T., Li, H., Zhou, G., Li, S., Wang, Y.: A review of research on instance segmentation methods. Comput. Sci. Explorat. (in Chinese) 17(4), 810 (2023)

    Google Scholar 

  2. Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13, pp. 297–312 (2014). Springer

  3. O Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. Adv. Neural Inf. Process. Syst. 28 (2015)

  4. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)

    Article  Google Scholar 

  5. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  6. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

  7. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  8. Vu, T., Kang, H., Yoo, C.D.: Scnet: Training inference sample consistency for instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2701–2709 (2021)

  9. Zang, Y., Huang, C., Loy, C.C.: Fasa: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3457–3466 (2021)

  10. Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5443–5452 (2021)

  11. Zhang, G., Lu, X., Tan, J., Li, J., Zhang, Z., Li, Q., Hu, X.: Refinemask: Towards high-quality instance segmentation with fine-grained features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6861–6869 (2021)

  12. Liu, S., Jia, J., Fidler, S., Urtasun, R.: Sgn: Sequential grouping networks for instance segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3496–3504 (2017)

  13. Gao, N., Shan, Y., Wang, Y., Zhao, X., Yu, Y., Yang, M., Huang, K.: Ssap: Single-shot instance segmentation with affinity pyramid. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 642–651 (2019)

  14. Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4412–4421 (2022)

  15. Hu, J., Cao, L., Lu, Y., Zhang, S., Wang, Y., Li, K., Huang, F., Shao, L., Ji, R.: Istr: End-to-end instance segmentation with transformers. arXiv preprint arXiv:2105.00637 (2021)

  16. Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)

  17. Bolya, D., Zhou, C., Xiao, F., Lee, Y.: Yolact++: better realtime instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2019)

  18. Hurtik, P., Molek, V., Hula, J., Vajgl, M., Vlasanek, P., Nejezchleba, T.: Poly-yolo: higher speed, more precise detection and instance segmentation for yolov3. Neural Comput. Appl. 34(10), 8275–8290 (2022)

    Article  Google Scholar 

  19. Wang, X., Zhao, K., Zhang, R., Ding, S., Wang, Y., Shen, W.: Contrastmask: Contrastive learning to segment every thing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11604–11613 (2022)

  20. Xie, E., Sun, P., Song, X., Wang, W., Liu, X., Liang, D., Shen, C., Luo, P.: Polarmask: Single shot instance segmentation with polar representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12193–12202 (2020)

  21. Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: 2011 International Conference on Computer Vision, pp. 1879–1886 (2011). IEEE

  22. Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: Solo: Segmenting objects by locations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 649–665 (2020). Springer

  23. Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural. Inf. Process. Syst. 33, 17721–17732 (2020)

    Google Scholar 

  24. Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: A foundation for dense object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2061–2069 (2019)

  25. Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8573–8581 (2020)

  26. Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020)

  27. Fang, Y., Yang, S., Wang, X., Li, Y., Fang, C., Shan, Y., Feng, B., Liu, W.: Queryinst: Parallelly supervised mask query for instance segmentation. arXiv preprint arXiv:2105.01928 (2021)

  28. Zhang, T., Wei, S., Ji, S.: E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4443–4452 (2022)

  29. Chen, H., Ding, L., Yao, F., Ren, P., Wang, S.: Panoptic segmentation of uav images with deformable convolution network and mask scoring. In: Twelfth International Conference on Graphics and Image Processing (ICGIP 2020), vol. 11720, pp. 312–321 (2021). SPIE

  30. Zhang, W., Liu, C., Chang, F., Song, Y.: Multi-scale and occlusion aware network for vehicle detection and segmentation on uav aerial images. Remote Sensing 12(11), 1760 (2020)

    Article  Google Scholar 

  31. El Amrani Abouelassad, S., Rottensteiner, F.: Vehicle instance segmentation with rotated bounding boxes in uav images using cnn. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 1, 15–23 (2022)

    Article  Google Scholar 

  32. Garg, P., Chakravarthy, A.S., Mandal, M., Narang, P., Chamola, V., Guizani, M.: Isdnet: Ai-enabled instance segmentation of aerial scenes for smart cities. ACM Trans. Internet Technol. (TOIT) 21(3), 1–18 (2021)

    Article  Google Scholar 

  33. Bang, S., Hong, Y., Kim, H.: Proactive proximity monitoring with instance segmentation and unmanned aerial vehicle-acquired video-frame prediction. Comput. Aided Civil Infrastruct. Eng. 36(6), 800–816 (2021)

    Article  Google Scholar 

  34. Li, Y., Chai, G., Wang, Y., Lei, L., Zhang, X.: Ace r-cnn: an attention complementary and edge detection-based instance segmentation algorithm for individual tree species identification using uav rgb images and lidar data. Remote Sensing 14(13), 3035 (2022)

    Article  Google Scholar 

Download references

Funding

This work is supported by Chongqing Natural Science Foundation (CSTB2022NSCQ-MSX1415).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Tan.

Ethics declarations

Conflict of interest

The authors declare that there are no conflict of interests, we do not have any possible conflicts of interest.

Ethical Approval

This declaration is “not applicable.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, L., Liu, Z., Huang, X. et al. A transformer-based UAV instance segmentation model TF-YOLOv7. SIViP 18, 3299–3308 (2024). https://doi.org/10.1007/s11760-023-02992-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02992-3

Keywords

Navigation