Skip to main content
Log in

Transformer-Based Multi-layer Feature Aggregation and Rotated Anchor Matching for Oriented Object Detection in Remote Sensing Images

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Object detection has made significant progress in computer vision. However, challenges remain in detecting small, arbitrarily oriented, and densely distributed objects, especially in aerial remote sensing images. This paper presents MATDet, an end-to-end encoder-decoder detection network based on the Transformer designed for oriented object detection. The network employs multi-layer feature aggregation and rotated anchor matching methods to improve oriented small and densely distributed object detection accuracy. Specifically, the encoder is responsible for encoding labeled image blocks using convolutional neural network (CNN) feature maps. It efficiently fuses these blocks with higher resolution multi-scale features through cross-layer connections, facilitating the extraction of global contextual information. The decoder then performs an upsampling of the encoded features, effectively recovering the full spatial resolution of the feature maps to capture essential local–global semantic features for accurate object localization. In addition, high quality proposed anchor boxes are generated by refined convolution, and the convolved features are adaptively aligned according to the anchor boxes to reduce redundant computation. The proposed MATDet achieves mAPs of 80.35%, 78.83%, 73.60%, and 98.01% on the DOTAv1.0, DOTAv1.5, DIOR, and HRSC2016 datasets, respectively, proving that it outperforms the baseline model for oriented object detection. This validation confirms the feasibility and effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Code Availability

The code of the current study are available from the corresponding author on reasonable request.

References

  1. Molekoa, M.D.; Kumar, P.; Choudhary, B.K.; Yunus, A.P.; Kharrazi, A.; Khedher, K.M.; Alshayeb, M.J.; Singh, B.P.; Minh, H.V.T.; Kurniawan, T.A.; Ram, A.: Spatio-temporal variations in the water quality of the Doorndraai dam, South Africa: an assessment of sustainable water resource management. Curr. Res. Environ. Sustain. 4, 100187 (2022)

    Article  Google Scholar 

  2. Cheng, C.; Zhang, F.; Shi, J.; Kung, H.-T.: What is the relationship between land use and surface water quality? A review and prospects from remote sensing perspective. Environ. Sci. Pollut. Res. 29(38), 56887–56907 (2022)

    Article  Google Scholar 

  3. Di Benedetto, A.; Fiani, M.; Marsella, M.; D’Aranno, P.: Remote sensing technologies for linear infrastructure monitoring. Int. Archiv. Photogramm. Remote Sens. Spatial Inf. Sci. 42, 461–468 (2019)

    Google Scholar 

  4. Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75 (2017)

  5. Yan, M.; Wang, J.; Li, J.; Zhang, K.; Yang, Z.: Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context. Neurocomputing 386, 293–304 (2020)

    Article  Google Scholar 

  6. Wang, Y.; Bashir, S.M.A.; Khan, M.; Ullah, Q.; Wang, R.; Song, Y.; Guo, Z.; Niu, Y.: Remote sensing image super-resolution and object detection: benchmark and state of the art. Expert Syst. Appl. 197, 116793 (2022)

    Article  Google Scholar 

  7. Bay, H.; Tuytelaars, T.; Van Gool, L.: Surf: Speeded up robust features. Lect. Notes Comput. Sci. 3951, 404–417 (2006)

    Article  Google Scholar 

  8. Chiew, K.L.; Wang, Y.C.: Shape feature representation in partial object matching. In: 2006 International Conference on Computing and Informatics, pp. 1–6 (2006). IEEE

  9. Hannan, M.A.; Arebey, M.; Begum, R.A.; Basri, H.: An automated solid waste bin level detection system using a gray level aura matrix. Waste Manage. 32(12), 2229–2238 (2012)

    Article  Google Scholar 

  10. Azimi, S.M.; Vig, E.; Bahmanyar, R.; Körner, M.; Reinartz, P.: Towards multi-class object detection in unconstrained remote sensing imagery. In: Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III, pp. 150–165 (2019). Springer

  11. Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R.: Radet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images. Remote Sensing 12(3), 389 (2020)

    Article  Google Scholar 

  12. Chen, X.; Ma, L.; Du, Q.: Oriented object detection by searching corner points in remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021)

    Google Scholar 

  13. Zheng, S.; Wu, Z.; Xu, Y.; Wei, Z.; Plaza, A.: Learning orientation information from frequency-domain for oriented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–12 (2022)

    Article  Google Scholar 

  14. Wei, H.; Zhang, Y.; Chang, Z.; Li, H.; Wang, H.; Sun, X.: Oriented objects as pairs of middle lines. ISPRS J. Photogramm. Remote. Sens. 169, 268–279 (2020)

    Article  Google Scholar 

  15. Yang, X.; Yan, J.; Feng, Z.; He, T.: R3det: Refined single-stage detector with feature refinement for rotating object. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, 3163–3171 (2021)

  16. Han, J.; Ding, J.; Li, J.; Xia, G.-S.: Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2021)

    Google Scholar 

  17. Redmon, J.; Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  18. Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z.: R2cnn: Rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579 (2017)

  19. Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)

    Article  Google Scholar 

  20. Qian, W.; Yang, X.; Peng, S.; Yan, J.; Guo, Y.: Learning modulated loss for rotated object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, 2458–2466 (2021)

  21. Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021)

  22. Ren, S.; He, K.; Girshick, R.; Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(06), 1137–1149 (2017)

    Article  Google Scholar 

  23. Han, J.; Zhang, D.; Cheng, G.; Guo, L.; Ren, J.: Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote Sens. 53(6), 3325–3337 (2014)

    Article  Google Scholar 

  24. Li, Y.; Chen, W.; Zhang, Y.; Tao, C.; Xiao, R.; Tan, Y.: Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens. Environ. 250, 112045 (2020)

    Article  Google Scholar 

  25. Feng, X.; Han, J.; Yao, X.; Cheng, G.: Progressive contextual instance refinement for weakly supervised object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 58(11), 8002–8012 (2020)

    Article  Google Scholar 

  26. Yao, X.; Feng, X.; Han, J.; Cheng, G.; Guo, L.: Automatic weakly supervised object detection from high spatial resolution remote sensing images via dynamic curriculum learning. IEEE Trans. Geosci. Remote Sens. 59(1), 675–685 (2020)

    Article  Google Scholar 

  27. Zheng, Z.; Ma, A.; Zhang, L.; Zhong, Y.: Change is everywhere: Single-temporal supervised object change detection in remote sensing imagery. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15173–15182 (2021). IEEE Computer Society

  28. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  29. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C.: SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37 (2016). Springer

  30. Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  31. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  32. Dai, J.; Li, Y.; He, K.; Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 29 (2016)

  33. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  34. Tian, Z.; Shen, C.; Chen, H.; He, T.: FCOS: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

  35. Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)

  36. Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; Loy, C.C.; Lin, D.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)

  37. Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S.: Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)

  38. Cheng, G.; Zhou, P.; Han, J.: Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 54(12), 7405–7415 (2016)

    Article  Google Scholar 

  39. Cheng, G.; Zhou, P.; Han, J.: RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2884–2893 (2016)

  40. Zhao, P.; Qu, Z.; Bu, Y.; Tan, W.; Guan, Q.: Polardet: a fast, more precise detector for rotated target in aerial images. Int. J. Remote Sens. 42(15), 5831–5861 (2021)

    Article  Google Scholar 

  41. Hou, L.; Lu, K.; Xue, J.; Li, Y.: Shape-adaptive selection and measurement for oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 923–932 (2022)

  42. Ding, J.; Xue, N.; Long, Y.; Xia, G.-S.; Lu, Q.: Learning roi transformer for oriented object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2849–2858 (2019)

  43. Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.-S.; Bai, X.: Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1452–1459 (2020)

    Article  Google Scholar 

  44. Yi, J.; Wu, P.; Liu, B.; Huang, Q.; Qu, H.; Metaxas, D.: Oriented object detection in aerial images with box boundary-aware vectors. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2150–2159 (2021)

  45. Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K.: Scrdet: Towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8232–8241 (2019)

  46. Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T.: Scrdet++: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 2384–2399 (2022)

    Article  Google Scholar 

  47. Han, J.; Ding, J.; Xue, N.; Xia, G.-S.: Redet: A rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2786–2795 (2021)

  48. Zhou, P.; Ni, B.; Geng, C.; Hu, J.; Xu, Y.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 528–537 (2018)

  49. Zhao, Q.; Sheng, T.; Wang, Y.; Tang, Z.; Chen, Y.; Cai, L.; Ling, H.: M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9259–9266 (2019)

  50. Tan, M.; Pang, R.; Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

  51. Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L.: Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 18(3), 431–435 (2020)

    Article  Google Scholar 

  52. Pan, X.; Ren, Y.; Sheng, K.; Dong, W.; Yuan, H.; Guo, X.; Ma, C.; Xu, C.: Dynamic refinement network for oriented and densely packed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11207–11216 (2020)

  53. Gao, T.; Liu, Z.; Zhang, J.; Wu, G.; Chen, T.: A task-balanced multiscale adaptive fusion network for object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 61, 1–15 (2023)

    Google Scholar 

  54. Shi, Q.; Zhu, Y.; Fang, C.; Wang, N.; Lin, J.: Raod: refined oriented detector with augmented feature in remote sensing images object detection. Appl. Intell. 52(13), 15278–15294 (2022)

    Article  Google Scholar 

  55. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

  56. Chen, H.; Qi, Z.; Shi, Z.: Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 60, 1–14 (2021)

    Article  Google Scholar 

  57. Wang, G.; Li, B.; Zhang, T.; Zhang, S.: A network combining a transformer and a convolutional neural network for remote sensing image change detection. Remote Sens. 14(9), 2228 (2022)

    Article  Google Scholar 

  58. Li, Q.; Chen, Y.; Zeng, Y.: Transformer with transfer CNN for remote-sensing-image object detection. Remote Sens. 14(4), 984 (2022)

    Article  Google Scholar 

  59. Aleissaee, A.A.; Kumar, A.; Anwer, R.M.; Khan, S.; Cholakkal, H.; Xia, G.; Khan, F.S.: Transformers in remote sensing: a survey. Remote Sens. 15(7), 1860 (2023)

    Article  Google Scholar 

  60. Dai, L.; Liu, H.; Tang, H.; Wu, Z.; Song, P.: AO2-DETR: Arbitrary-oriented object detection transformer. IEEE Trans. Circuits Syst. Video Technol. (2022)

  61. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S.: End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 213–229 (2020). Springer

  62. Dai, Y.; Yu, J.; Zhang, D.; Hu, T.; Zheng, X.: RODFormer: high-precision design for rotating object detection with transformers. Sensors 22(7), 2633 (2022)

    Article  Google Scholar 

  63. Ma, T.; Mao, M.; Zheng, H.; Gao, P.; Wang, X.; Han, S.; Ding, E.; Zhang, B.; Doermann, D.: Oriented object detection with transformer. arXiv preprint arXiv:2106.03146 (2021)

  64. Naseer, M.M.; Ranasinghe, K.; Khan, S.H.; Hayat, M.; Shahbaz Khan, F.; Yang, M.-H.: Intriguing properties of vision transformers. Adv. Neural. Inf. Process. Syst. 34, 23296–23308 (2021)

    Google Scholar 

  65. Hochreiter, S.; Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  66. Yang, X.; Yan, J.: Arbitrary-oriented object detection with circular smooth label. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 677–694 (2020). Springer

  67. Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J.: Learning high-precision bounding box for rotated object detection via Kullback–Leibler divergence. Adv. Neural. Inf. Process. Syst. 34, 18381–18394 (2021)

    Google Scholar 

  68. Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning, pp. 11830–11841 (2021). PMLR

  69. Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J.: Dense label encoding for boundary discontinuity free rotation detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15819–15829 (2021)

  70. Ming, Q.; Zhou, Z.; Miao, L.; Zhang, H.; Li, L.: Dynamic anchor learning for arbitrary-oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2355–2363 (2021)

  71. Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L.: Dota: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3974–3983 (2018)

  72. Wang, J.; Ding, J.; Guo, H.; Cheng, W.; Pan, T.; Yang, W.: Mask OBB: a semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images. Remote Sens. 11(24), 2930 (2019)

    Article  Google Scholar 

  73. Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote. Sens. 159, 296–307 (2020)

    Article  Google Scholar 

  74. Liu, Z.; Yuan, L.; Weng, L.; Yang, Y.: A high resolution optical satellite image dataset for ship recognition and some new baselines. In: International Conference on Pattern Recognition Applications and Methods, vol. 2, pp. 324–331 (2017)

  75. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 303–338 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Social Science Foundation of China under Grants (21BTJ071).

Funding

This work is supported by the National Social Science Foundation of China under Grants (21BTJ071).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by CJ, AZ and ZW. The first draft of the manuscript was written by CJ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Chuan Jin.

Ethics declarations

Conflict of interest

All the authors certify that there is no conflict of interest with any individual or organization for this work.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent for Publication

The participant has consented to the submission of the case report to the journal.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, C., Zheng, A., Wu, Z. et al. Transformer-Based Multi-layer Feature Aggregation and Rotated Anchor Matching for Oriented Object Detection in Remote Sensing Images. Arab J Sci Eng (2024). https://doi.org/10.1007/s13369-024-08892-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13369-024-08892-z

Keywords

Navigation