A transformer-based UAV instance segmentation model TF-YOLOv7

Tan, Li; Liu, Zikang; Huang, Xiaokai; Li, Dongfang; Wang, Feifei

doi:10.1007/s11760-023-02992-3

A transformer-based UAV instance segmentation model TF-YOLOv7

Original Paper
Published: 09 February 2024

Volume 18, pages 3299–3308, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Li Tan¹,
Zikang Liu¹,
Xiaokai Huang¹,
Dongfang Li¹ &
…
Feifei Wang¹

169 Accesses
1 Altmetric
Explore all metrics

Abstract

In a dense target scenario in a real city, how to efficiently achieve the labeling of different targets and overcome the problem of mutual occlusion caused by the dense targets in the process becomes the key point of our UAV instance segmentation, therefore, to address the problem of mutual occlusion of targets in UAV instance segmentation, this paper proposes a model for UAV instance segmentation TF-YOLOv7. This model introduced Swin Transformer structure in the backbone network to construct a hierarchical feature map by fusing deep network feature blocks, which is well suited for the dense recognition task of instance segmentation. In addition, the Bottleneck Transformer structure was introduced in the detection stage to recognize the abstract information of the underlying features using convolution, and the higher-level information obtained through the convolution layer is processed using the self-attention mechanism, which could effectively handle large resolution images. Finally, the Focal-EioU loss function was introduced to further optimize the masking performance in mutually occluded small targets for the masking problem in occluded target segmentation and improve the segmentation effect on occluded targets. Through experimental validation on the UAV aerial photography dataset VisDroneDET, our proposed model has a 2.2% performance improvement compared with the benchmark model YOLOv7, proving that the model is suitable for UAV instance segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework

Article 23 March 2024

MA Mask R-CNN: MPR and AFPN Based Mask R-CNN

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

Article 19 January 2024

Availability of data and materials

The data that support the findings of this study are openly available in [VisDrone/VisDrone-Dataset] at https://github.com/VisDrone/VisDrone-Dataset. The original image data come from here, and then, we combined the Label Studio tool and SAM model to generate segmentation labels (masks). We applied the VisDrone dataset with segmentation labels to this experiment

References

Huang, T., Li, H., Zhou, G., Li, S., Wang, Y.: A review of research on instance segmentation methods. Comput. Sci. Explorat. (in Chinese) 17(4), 810 (2023)
Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13, pp. 297–312 (2014). Springer
O Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. Adv. Neural Inf. Process. Syst. 28 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)
Article Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Vu, T., Kang, H., Yoo, C.D.: Scnet: Training inference sample consistency for instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2701–2709 (2021)
Zang, Y., Huang, C., Loy, C.C.: Fasa: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3457–3466 (2021)
Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5443–5452 (2021)
Zhang, G., Lu, X., Tan, J., Li, J., Zhang, Z., Li, Q., Hu, X.: Refinemask: Towards high-quality instance segmentation with fine-grained features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6861–6869 (2021)
Liu, S., Jia, J., Fidler, S., Urtasun, R.: Sgn: Sequential grouping networks for instance segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3496–3504 (2017)
Gao, N., Shan, Y., Wang, Y., Zhao, X., Yu, Y., Yang, M., Huang, K.: Ssap: Single-shot instance segmentation with affinity pyramid. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 642–651 (2019)
Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4412–4421 (2022)
Hu, J., Cao, L., Lu, Y., Zhang, S., Wang, Y., Li, K., Huang, F., Shao, L., Ji, R.: Istr: End-to-end instance segmentation with transformers. arXiv preprint arXiv:2105.00637 (2021)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.J.: Yolact: Real-time instance segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9157–9166 (2019)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.: Yolact++: better realtime instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Hurtik, P., Molek, V., Hula, J., Vajgl, M., Vlasanek, P., Nejezchleba, T.: Poly-yolo: higher speed, more precise detection and instance segmentation for yolov3. Neural Comput. Appl. 34(10), 8275–8290 (2022)
Article Google Scholar
Wang, X., Zhao, K., Zhang, R., Ding, S., Wang, Y., Shen, W.: Contrastmask: Contrastive learning to segment every thing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11604–11613 (2022)
Xie, E., Sun, P., Song, X., Wang, W., Liu, X., Liang, D., Shen, C., Luo, P.: Polarmask: Single shot instance segmentation with polar representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12193–12202 (2020)
Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: 2011 International Conference on Computer Vision, pp. 1879–1886 (2011). IEEE
Wang, X., Kong, T., Shen, C., Jiang, Y., Li, L.: Solo: Segmenting objects by locations. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 649–665 (2020). Springer
Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural. Inf. Process. Syst. 33, 17721–17732 (2020)
Google Scholar
Chen, X., Girshick, R., He, K., Dollár, P.: Tensormask: A foundation for dense object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2061–2069 (2019)
Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: Top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8573–8581 (2020)
Kirillov, A., Wu, Y., He, K., Girshick, R.: Pointrend: Image segmentation as rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799–9808 (2020)
Fang, Y., Yang, S., Wang, X., Li, Y., Fang, C., Shan, Y., Feng, B., Liu, W.: Queryinst: Parallelly supervised mask query for instance segmentation. arXiv preprint arXiv:2105.01928 (2021)
Zhang, T., Wei, S., Ji, S.: E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4443–4452 (2022)
Chen, H., Ding, L., Yao, F., Ren, P., Wang, S.: Panoptic segmentation of uav images with deformable convolution network and mask scoring. In: Twelfth International Conference on Graphics and Image Processing (ICGIP 2020), vol. 11720, pp. 312–321 (2021). SPIE
Zhang, W., Liu, C., Chang, F., Song, Y.: Multi-scale and occlusion aware network for vehicle detection and segmentation on uav aerial images. Remote Sensing 12(11), 1760 (2020)
Article Google Scholar
El Amrani Abouelassad, S., Rottensteiner, F.: Vehicle instance segmentation with rotated bounding boxes in uav images using cnn. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 1, 15–23 (2022)
Article Google Scholar
Garg, P., Chakravarthy, A.S., Mandal, M., Narang, P., Chamola, V., Guizani, M.: Isdnet: Ai-enabled instance segmentation of aerial scenes for smart cities. ACM Trans. Internet Technol. (TOIT) 21(3), 1–18 (2021)
Article Google Scholar
Bang, S., Hong, Y., Kim, H.: Proactive proximity monitoring with instance segmentation and unmanned aerial vehicle-acquired video-frame prediction. Comput. Aided Civil Infrastruct. Eng. 36(6), 800–816 (2021)
Article Google Scholar
Li, Y., Chai, G., Wang, Y., Lei, L., Zhang, X.: Ace r-cnn: an attention complementary and edge detection-based instance segmentation algorithm for individual tree species identification using uav rgb images and lidar data. Remote Sensing 14(13), 3035 (2022)
Article Google Scholar

Download references

Funding

This work is supported by Chongqing Natural Science Foundation (CSTB2022NSCQ-MSX1415).

Author information

Authors and Affiliations

School of Computer and Artificial Intelligence, Beijing Technology and Business University, Haidian, Beijing, 100080, China
Li Tan, Zikang Liu, Xiaokai Huang, Dongfang Li & Feifei Wang

Authors

Li Tan
View author publications
You can also search for this author in PubMed Google Scholar
Zikang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaokai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dongfang Li
View author publications
You can also search for this author in PubMed Google Scholar
Feifei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Tan.

Ethics declarations

Conflict of interest

The authors declare that there are no conflict of interests, we do not have any possible conflicts of interest.

Ethical Approval

This declaration is “not applicable.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tan, L., Liu, Z., Huang, X. et al. A transformer-based UAV instance segmentation model TF-YOLOv7. SIViP 18, 3299–3308 (2024). https://doi.org/10.1007/s11760-023-02992-3

Download citation

Received: 26 October 2023
Revised: 16 December 2023
Accepted: 29 December 2023
Published: 09 February 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11760-023-02992-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A transformer-based UAV instance segmentation model TF-YOLOv7

Abstract

Access this article

Similar content being viewed by others

GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework

MA Mask R-CNN: MPR and AFPN Based Mask R-CNN

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A transformer-based UAV instance segmentation model TF-YOLOv7

Abstract

Access this article

Similar content being viewed by others

GHA-Inst: a real-time instance segmentation model utilizing YOLO detection framework

MA Mask R-CNN: MPR and AFPN Based Mask R-CNN

NDAM-YOLOseg: a real-time instance segmentation model based on multi-head attention mechanism

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation