Abstract
Object detection has made significant progress in computer vision. However, challenges remain in detecting small, arbitrarily oriented, and densely distributed objects, especially in aerial remote sensing images. This paper presents MATDet, an end-to-end encoder-decoder detection network based on the Transformer designed for oriented object detection. The network employs multi-layer feature aggregation and rotated anchor matching methods to improve oriented small and densely distributed object detection accuracy. Specifically, the encoder is responsible for encoding labeled image blocks using convolutional neural network (CNN) feature maps. It efficiently fuses these blocks with higher resolution multi-scale features through cross-layer connections, facilitating the extraction of global contextual information. The decoder then performs an upsampling of the encoded features, effectively recovering the full spatial resolution of the feature maps to capture essential local–global semantic features for accurate object localization. In addition, high quality proposed anchor boxes are generated by refined convolution, and the convolved features are adaptively aligned according to the anchor boxes to reduce redundant computation. The proposed MATDet achieves mAPs of 80.35%, 78.83%, 73.60%, and 98.01% on the DOTAv1.0, DOTAv1.5, DIOR, and HRSC2016 datasets, respectively, proving that it outperforms the baseline model for oriented object detection. This validation confirms the feasibility and effectiveness of the proposed methods.
Similar content being viewed by others
Code Availability
The code of the current study are available from the corresponding author on reasonable request.
References
Molekoa, M.D.; Kumar, P.; Choudhary, B.K.; Yunus, A.P.; Kharrazi, A.; Khedher, K.M.; Alshayeb, M.J.; Singh, B.P.; Minh, H.V.T.; Kurniawan, T.A.; Ram, A.: Spatio-temporal variations in the water quality of the Doorndraai dam, South Africa: an assessment of sustainable water resource management. Curr. Res. Environ. Sustain. 4, 100187 (2022)
Cheng, C.; Zhang, F.; Shi, J.; Kung, H.-T.: What is the relationship between land use and surface water quality? A review and prospects from remote sensing perspective. Environ. Sci. Pollut. Res. 29(38), 56887–56907 (2022)
Di Benedetto, A.; Fiani, M.; Marsella, M.; D’Aranno, P.: Remote sensing technologies for linear infrastructure monitoring. Int. Archiv. Photogramm. Remote Sens. Spatial Inf. Sci. 42, 461–468 (2019)
Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 66–75 (2017)
Yan, M.; Wang, J.; Li, J.; Zhang, K.; Yang, Z.: Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context. Neurocomputing 386, 293–304 (2020)
Wang, Y.; Bashir, S.M.A.; Khan, M.; Ullah, Q.; Wang, R.; Song, Y.; Guo, Z.; Niu, Y.: Remote sensing image super-resolution and object detection: benchmark and state of the art. Expert Syst. Appl. 197, 116793 (2022)
Bay, H.; Tuytelaars, T.; Van Gool, L.: Surf: Speeded up robust features. Lect. Notes Comput. Sci. 3951, 404–417 (2006)
Chiew, K.L.; Wang, Y.C.: Shape feature representation in partial object matching. In: 2006 International Conference on Computing and Informatics, pp. 1–6 (2006). IEEE
Hannan, M.A.; Arebey, M.; Begum, R.A.; Basri, H.: An automated solid waste bin level detection system using a gray level aura matrix. Waste Manage. 32(12), 2229–2238 (2012)
Azimi, S.M.; Vig, E.; Bahmanyar, R.; Körner, M.; Reinartz, P.: Towards multi-class object detection in unconstrained remote sensing imagery. In: Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III, pp. 150–165 (2019). Springer
Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R.: Radet: Refine feature pyramid network and multi-layer attention network for arbitrary-oriented object detection of remote sensing images. Remote Sensing 12(3), 389 (2020)
Chen, X.; Ma, L.; Du, Q.: Oriented object detection by searching corner points in remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021)
Zheng, S.; Wu, Z.; Xu, Y.; Wei, Z.; Plaza, A.: Learning orientation information from frequency-domain for oriented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–12 (2022)
Wei, H.; Zhang, Y.; Chang, Z.; Li, H.; Wang, H.; Sun, X.: Oriented objects as pairs of middle lines. ISPRS J. Photogramm. Remote. Sens. 169, 268–279 (2020)
Yang, X.; Yan, J.; Feng, Z.; He, T.: R3det: Refined single-stage detector with feature refinement for rotating object. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, 3163–3171 (2021)
Han, J.; Ding, J.; Li, J.; Xia, G.-S.: Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2021)
Redmon, J.; Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z.: R2cnn: Rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579 (2017)
Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Qian, W.; Yang, X.; Peng, S.; Yan, J.; Guo, Y.: Learning modulated loss for rotated object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence 35, 2458–2466 (2021)
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021)
Ren, S.; He, K.; Girshick, R.; Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(06), 1137–1149 (2017)
Han, J.; Zhang, D.; Cheng, G.; Guo, L.; Ren, J.: Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote Sens. 53(6), 3325–3337 (2014)
Li, Y.; Chen, W.; Zhang, Y.; Tao, C.; Xiao, R.; Tan, Y.: Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens. Environ. 250, 112045 (2020)
Feng, X.; Han, J.; Yao, X.; Cheng, G.: Progressive contextual instance refinement for weakly supervised object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 58(11), 8002–8012 (2020)
Yao, X.; Feng, X.; Han, J.; Cheng, G.; Guo, L.: Automatic weakly supervised object detection from high spatial resolution remote sensing images via dynamic curriculum learning. IEEE Trans. Geosci. Remote Sens. 59(1), 675–685 (2020)
Zheng, Z.; Ma, A.; Zhang, L.; Zhong, Y.: Change is everywhere: Single-temporal supervised object change detection in remote sensing imagery. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15173–15182 (2021). IEEE Computer Society
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C.: SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37 (2016). Springer
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Dai, J.; Li, Y.; He, K.; Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. Adv. Neural Inf. Process. Syst. 29 (2016)
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Tian, Z.; Shen, C.; Chen, H.; He, T.: FCOS: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; Loy, C.C.; Lin, D.: Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4974–4983 (2019)
Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S.: Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (2019)
Cheng, G.; Zhou, P.; Han, J.: Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 54(12), 7405–7415 (2016)
Cheng, G.; Zhou, P.; Han, J.: RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2884–2893 (2016)
Zhao, P.; Qu, Z.; Bu, Y.; Tan, W.; Guan, Q.: Polardet: a fast, more precise detector for rotated target in aerial images. Int. J. Remote Sens. 42(15), 5831–5861 (2021)
Hou, L.; Lu, K.; Xue, J.; Li, Y.: Shape-adaptive selection and measurement for oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 923–932 (2022)
Ding, J.; Xue, N.; Long, Y.; Xia, G.-S.; Lu, Q.: Learning roi transformer for oriented object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2849–2858 (2019)
Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.-S.; Bai, X.: Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1452–1459 (2020)
Yi, J.; Wu, P.; Liu, B.; Huang, Q.; Qu, H.; Metaxas, D.: Oriented object detection in aerial images with box boundary-aware vectors. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2150–2159 (2021)
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K.: Scrdet: Towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8232–8241 (2019)
Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T.: Scrdet++: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 2384–2399 (2022)
Han, J.; Ding, J.; Xue, N.; Xia, G.-S.: Redet: A rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2786–2795 (2021)
Zhou, P.; Ni, B.; Geng, C.; Hu, J.; Xu, Y.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 528–537 (2018)
Zhao, Q.; Sheng, T.; Wang, Y.; Tang, Z.; Chen, Y.; Cai, L.; Ling, H.: M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9259–9266 (2019)
Tan, M.; Pang, R.; Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L.: Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 18(3), 431–435 (2020)
Pan, X.; Ren, Y.; Sheng, K.; Dong, W.; Yuan, H.; Guo, X.; Ma, C.; Xu, C.: Dynamic refinement network for oriented and densely packed object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11207–11216 (2020)
Gao, T.; Liu, Z.; Zhang, J.; Wu, G.; Chen, T.: A task-balanced multiscale adaptive fusion network for object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 61, 1–15 (2023)
Shi, Q.; Zhu, Y.; Fang, C.; Wang, N.; Lin, J.: Raod: refined oriented detector with augmented feature in remote sensing images object detection. Appl. Intell. 52(13), 15278–15294 (2022)
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Chen, H.; Qi, Z.; Shi, Z.: Remote sensing image change detection with transformers. IEEE Trans. Geosci. Remote Sens. 60, 1–14 (2021)
Wang, G.; Li, B.; Zhang, T.; Zhang, S.: A network combining a transformer and a convolutional neural network for remote sensing image change detection. Remote Sens. 14(9), 2228 (2022)
Li, Q.; Chen, Y.; Zeng, Y.: Transformer with transfer CNN for remote-sensing-image object detection. Remote Sens. 14(4), 984 (2022)
Aleissaee, A.A.; Kumar, A.; Anwer, R.M.; Khan, S.; Cholakkal, H.; Xia, G.; Khan, F.S.: Transformers in remote sensing: a survey. Remote Sens. 15(7), 1860 (2023)
Dai, L.; Liu, H.; Tang, H.; Wu, Z.; Song, P.: AO2-DETR: Arbitrary-oriented object detection transformer. IEEE Trans. Circuits Syst. Video Technol. (2022)
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S.: End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 213–229 (2020). Springer
Dai, Y.; Yu, J.; Zhang, D.; Hu, T.; Zheng, X.: RODFormer: high-precision design for rotating object detection with transformers. Sensors 22(7), 2633 (2022)
Ma, T.; Mao, M.; Zheng, H.; Gao, P.; Wang, X.; Han, S.; Ding, E.; Zhang, B.; Doermann, D.: Oriented object detection with transformer. arXiv preprint arXiv:2106.03146 (2021)
Naseer, M.M.; Ranasinghe, K.; Khan, S.H.; Hayat, M.; Shahbaz Khan, F.; Yang, M.-H.: Intriguing properties of vision transformers. Adv. Neural. Inf. Process. Syst. 34, 23296–23308 (2021)
Hochreiter, S.; Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Yang, X.; Yan, J.: Arbitrary-oriented object detection with circular smooth label. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pp. 677–694 (2020). Springer
Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J.: Learning high-precision bounding box for rotated object detection via Kullback–Leibler divergence. Adv. Neural. Inf. Process. Syst. 34, 18381–18394 (2021)
Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning, pp. 11830–11841 (2021). PMLR
Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J.: Dense label encoding for boundary discontinuity free rotation detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15819–15829 (2021)
Ming, Q.; Zhou, Z.; Miao, L.; Zhang, H.; Li, L.: Dynamic anchor learning for arbitrary-oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2355–2363 (2021)
Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L.: Dota: A large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3974–3983 (2018)
Wang, J.; Ding, J.; Guo, H.; Cheng, W.; Pan, T.; Yang, W.: Mask OBB: a semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images. Remote Sens. 11(24), 2930 (2019)
Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J.: Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J. Photogramm. Remote. Sens. 159, 296–307 (2020)
Liu, Z.; Yuan, L.; Weng, L.; Yang, Y.: A high resolution optical satellite image dataset for ship recognition and some new baselines. In: International Conference on Pattern Recognition Applications and Methods, vol. 2, pp. 324–331 (2017)
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 303–338 (2010)
Acknowledgements
This work is supported by the National Social Science Foundation of China under Grants (21BTJ071).
Funding
This work is supported by the National Social Science Foundation of China under Grants (21BTJ071).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by CJ, AZ and ZW. The first draft of the manuscript was written by CJ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All the authors certify that there is no conflict of interest with any individual or organization for this work.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to Participate
Informed consent was obtained from all individual participants included in the study.
Consent for Publication
The participant has consented to the submission of the case report to the journal.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jin, C., Zheng, A., Wu, Z. et al. Transformer-Based Multi-layer Feature Aggregation and Rotated Anchor Matching for Oriented Object Detection in Remote Sensing Images. Arab J Sci Eng (2024). https://doi.org/10.1007/s13369-024-08892-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13369-024-08892-z