Skip to main content
Log in

Regional filtering distillation for object detection

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Knowledge distillation is a common and effective method in model compression, which trains a compact student model to mimic the capability of a large teacher model to get superior generalization. Previous works on knowledge distillation are underperforming for challenging tasks such as object detection, compared to the general application of unsophisticated classification tasks. In this paper, we propose that the failure of knowledge distillation on object detection is mainly caused by the imbalance between features of informative and invalid background. Not all background noise is redundant, and the valuable background noise after screening contains relations between foreground and background. Therefore, we propose a novel regional filtering distillation (RFD) algorithm to solve this problem through two modules: region selection and attention-guided distillation. Region selection first filters massive invalid backgrounds and retains knowledge-dense regions on near object anchor locations. Attention-guided distillation further improves distillation performance on object detection tasks by extracting the relations between foreground and background to migrate key features. Extensive experiments on both one-stage and two-stage detectors have been conducted to prove the effectiveness of RFD. For example, RFD improves 2.8% and 2.6% mAP for ResNet50-RetinaNet and ResNet50-FPN student networks on the MS COCO dataset, respectively. We also evaluate our method with the Faster R-CNN model on Pascal VOC and KITTI benchmark, which obtain 1.52% and 4.36% mAP promotions for the ResNet18-FPN student network, respectively. Furthermore, our method increases 5.70% of mAP for MobileNetv2-SSD compared to the original model. The proposed RFD technique performs highly on detection tasks through regional filtering distillation. In the future, we plan to extend it to more challenging task scenarios, such as segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Shen, D., Wu, G., Suk, H.-I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng. 19, 221–248 (2017)

    Article  Google Scholar 

  2. Chen, J., Li, K., Deng, Q., Li, K., Yu, P.S.: Distributed deep learning model for intelligent video surveillance systems with edge computing. IEEE Trans. Ind. Inf. (2019)

  3. Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

  4. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: Proceedings of the European Conference on Computer Vision, pp. 21–37 (2016)

  5. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  6. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  7. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Proceedings of the European Conference on Computer Vision, pp. 213–229 (2020)

  8. Li, Y., Mao, H., Girshick, R.B., He, K.: Exploring plain vision transformer backbones for object detection. In: Proceedings of the European Conference on Computer Vision, pp. 280–296 (2022)

  9. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861

  10. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)

  11. He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2019)

  12. Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., Shao, L.: Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1529–1538 (2020)

  13. Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing convolutional neural networks (2015). arXiv preprint arXiv:1506.04449

  14. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)

  15. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531

  16. Mirzadeh, S., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5191–5198 (2020)

  17. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer (2016). arXiv preprint arXiv:1612.03928

  18. Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4933–4942 (2019)

  19. Guo, J., Han, K., Wang, Y., Wu, H., Chen, X., Xu, C., Xu, C.: Distilling object detectors via decoupled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2154–2164 (2021)

  20. Zhang, L., Ma, K.: Improve object detection with feature-based knowledge distillation: towards accurate and efficient detectors. In: Proceedings of the 9th International Conference on Learning Representations (2020)

  21. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Proc. IEEE Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  22. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017)

  23. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  24. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  25. Bayraktar, E., Wang, Y., Bue, A.D.: Fast re-obj: real-time object re-identification in rigid scenes. Mach. Vis. Appl. 33(6), 97 (2022)

  26. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  27. Chen, G., Choi, W., Yu, X., Han, T.X., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, pp. 742–751 (2017)

  28. Li, Q., Jin, S., Yan, J.: Mimicking very efficient network for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6356–6364 (2017)

  29. Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)

  30. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7130–7138 (2017)

  31. Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge distillation with adversarial samples supporting decision boundary. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3771–3778 (2019)

  32. Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)

  33. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3713–3722 (2019)

  34. Sun, R., Tang, F., Zhang, X., Xiong, H., Tian, Q.: Distilling object detectors with task adaptive regularization (2020). arXiv preprint arXiv:2006.13108

  35. Dai, X., Jiang, Z., Wu, Z., Bao, Y., Wang, Z., Liu, S., Zhou, E.: General instance distillation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7842–7851 (2021)

  36. Dong, N., Zhang, Y., Ding, M., Xu, S., Bai, Y.: One-stage object detection knowledge distillation via adversarial learning. Appl. Intell. 52(4), 4582–4598 (2022)

    Article  Google Scholar 

  37. DeVries, T., Misra, I., Wang, C., van der Maaten, L.: Does object recognition work for everyone. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

  38. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

  39. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  40. Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)

  41. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)

  42. Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: Mmdetection: Open mmlab detection toolbox and benchmark (2019). arXiv preprint arXiv:1906.07155

  43. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

  44. Zheng, Z., Ye, R., Wang, P., Ren, D., Zuo, W., Hou, Q., Cheng, M.: Localization distillation for dense object detection. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 9397–9406 (2022)

  45. Bayraktar, E., Yigit, C.B., Boyraz, P.: A hybrid image dataset toward bridging the gap between real and simulation environments for robotics. Mach. Vis. Appl. 30(1), 23–40 (2019)

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Jiangsu Province of China (BK20222012), Guangxi Science and Technology Project (AB22080026/2021AB22167), and National Natural Science Foundation of China (No. 61375021)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ningzhong Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, P., Zhang, J., Sun, H. et al. Regional filtering distillation for object detection. Machine Vision and Applications 35, 24 (2024). https://doi.org/10.1007/s00138-023-01503-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01503-1

Keywords

Navigation