Skip to main content
Log in

MLANet: multi-level attention network with multi-scale feature fusion for crowd counting

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Estimating the population in a given scene is a process known as crowd counting. The field has recently garnered significant attention, and many innovative methods have emerged. However, intense scale variations and background interference make crowd counting in realistic scenes always challenging. To address these in this paper, a multi-level attention network with multi-scale feature fusion named MLANet is proposed. The network consists of three sections: a multi-level base feature extraction front-end network, a centralized dilated multi-scale feature fusion mid-end network with a global attention module, and a back-end network for the generation of density maps. By incorporating a flexible attention module and multi-scale features, the method can accurately capture crowd information at different scales and achieve accurate counting results. We evaluated the method on four public datasets (UCF_CC_50, ShanghaiTech, WorldExpo’10, and Beijing BRT), and the experimental results demonstrate a significant reduction in counting error when compared with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Contact the corresponding author to obtain access to the labeled data set used in this article.

References

  1. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)

  2. Yuhong Li, Xiaofan Zhang, Deming Chen.: CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)

  3. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4031–4039 IEEE (2017)

  4. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision, pp. 734–750 (2018)

  5. Weizhe Liu, Mathieu Salzmann, Pascal Fua.: Context-Aware Crowd Counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5099–5108 (2019)

  6. Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., and Lin, L.: Crowd counting with deep structured scale integration network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1774–1783 (2019)

  7. Zhang, A., Y ue, L., Shen, J., Zhu, F., Zhen, X., Cao, X., and Shao, L.: Attentional neural fields for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5713–5722 (2019)

  8. Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)

    Article  PubMed  Google Scholar 

  9. Lin, Z., Davis, L.S.: Shape-based human detection and segmentation via hierarchical part-template matching. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 604–618 (2010)

    Article  PubMed  Google Scholar 

  10. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art Mach. IEEE Trans. Pat. Anal. Mach. Intell. 34(4), 743–761 (2011)

    Article  Google Scholar 

  11. Chan, A.B., Liang, Z.-S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)

  12. Chan, A. B., V asconcelos, N.: Bayesian poisson regression for crowd counting. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp. 545–551 (2009)

  13. Conte, D., Foggia, P., Percannella, G., Tufano, F., Vento, M.: A method for counting moving people in video surveillance videos. EURASIP J. Adv. Signal Process. 2010(1), 231240 (2010)

    Article  Google Scholar 

  14. Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int. J. Comput. Vision 75(2), 247–266 (2007)

    Article  Google Scholar 

  15. Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1861–1870 (2017)

  16. Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the 14th EEE International Conference on Advanced Video and Signal Based Surveillance pp. 1–6 (2017)

  17. Zhang, J., Tan, B., Sha, F., He, L.: Predicting pedestrian counts in crowded scenes with rich and high-dimensional features. IEEE Trans. Intell. Transp. Syst. 12(4), 1037–1046 (2011)

    Article  Google Scholar 

  18. Liu, J., Gao, C., Meng, D., Hauptmann, A.G.: Decidenet: counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206 (2018)

  19. Boominathan, L., Kruthiventi, S. S., and Babu, R. V.: Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia, pp. 640–644 (2016)

  20. Gao, J., Wang, Qi., Yuan, Y.: SCAR: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363, 1–8 (2019)

    Article  Google Scholar 

  21. Zhang L., Shi M. and Chen Q.: Crowd Counting via Scale-Adaptive Convolutional Neural Network. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 1113–1121 (2018)

  22. Zhang, Y., Zhou, C., Chang, F., Kot, A.C.: Multi-resolution attention convolutional neural network for crowd counting. Neurocomputing 329, 144–152 (2019)

    Article  Google Scholar 

  23. Li, P., Zhang, M., Wan, J., Jiang, M.: Multi-scale guided attention network for crowd counting. Signal Image Video Process 15, 1663–1670 (2021)

    Article  Google Scholar 

  24. Ding, X., He, F., Lin, Z., Wang, Y., Guo, H., Huang, Y.: Crowd density estimation using fusion of multi-layer features. IEEE Transact. Intell. Transport. Syst. 22(8), 4776–4787 (2020)

    Article  Google Scholar 

  25. Guo, D., Li, K., Zha, Z.-J., and Wang, M..: DADNet: Dilated-Attention-Deformable ConvNet for Crowd Counting. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1823–1832 (2019)

  26. Marsden, M., McGuinness, K.; Little, S. and E. O’Connor, N.: Fully Convolutional Crowd Counting on Highly Congested Scenes. VISIGRAPP 27–33 (2017)

  27. H. Idrees, I. Saleemi, C. Seibert and M. Shah: Multi-source Multi-scale Counting in Extremely Dense Crowd Images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)

  28. Zhang C., Li H., Wang X., Yang X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)

  29. Ding X., Lin Z., He F., Wang Y. and Huang Y.: A Deeply-Recursive Convolutional Network For Crowd Counting. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1942–1946 (2018)

  30. Xiong, L., Yi, H., Huang, X., et al.: An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales. Multimed Tools Appl 82, 13929–13949 (2023)

    Article  Google Scholar 

  31. Wang, S., Lu, Y., Zhou, T., et al.: SCLNet: spatial context learning network for congested crowd counting. Neurocomputing 404, 227–239 (2020)

    Article  Google Scholar 

  32. Gao, J., Wang, Q., Li, X.: PCC-net: perspective crowd counting via spatial convolutional network. IEEE T. Circ. Syst Vid. 30(10), 3486–3498 (2019)

    Article  Google Scholar 

  33. Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P ., Zhou, B., & Pang, Y.: Attention scaling for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4706–4715 (2020)

  34. Ma, J., Dai, Y., Tan, Y.P.: Atrous convolutions spatial pyramid network for crowd counting and density estimation. Neurocomputing 350, 91–101 (2019)

    Article  Google Scholar 

  35. Liang, L., Zhao, H., Zhou, F., et al.: PDDNet: lightweight congested crowd counting via pyramid depth-wise dilated convolution. Appl. Intell. 53, 10472–10484 (2023). https://doi.org/10.1007/s10489-022-03967-6

    Article  Google Scholar 

  36. Jiang X., Xiao Z., Zhang B. et al.: Crowd counting and density estimation by trellis encoder–decoder networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6133–6142 (2019)

  37. Shi M., Yang Z., Xu C., Chen Q.: Revisiting perspective information for efficient crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7279–7288 (2019)

  38. Rong L., Li C.: Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 3674–3683 (2021)

  39. Wang J., Jiang W., Ma L., Liu W., Xu Y.: Bidirectional attentive fusion with context gating for dense video captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7190–7198 (2018)

  40. Kingma DP, Ba J: Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412.6980(2014)

  41. Xiong, L., Li, Z., Huang, X., et al.: TFA-CNN: an efficient method for dealing with crowding and noise problems in crowd counting. Multimed. Syst. 29, 3259–3276 (2023)

    Article  Google Scholar 

  42. Liang, D., Chen, X., Wei, Xu., Zhou, Yu., Bai, X.: TransCrowd: weakly-supervised crowd counting with transformers. Sci. China Inf. Sci. 65(6), 1–14 (2022)

    Article  ADS  Google Scholar 

  43. Ma, Y.: Inception-based crowd counting - being fast while remaining accurate. arXiv https://arxiv.org/abs/2210.09796v1 (2022)

  44. Li, Z., Shuhua, Lu., Dong, Y., Guo, J.: MSFFA: a multi-scale feature fusion and attention mechanism network for crowd counting. Vis. Comput. 39(3), 1045–1056 (2023)

    Article  Google Scholar 

  45. Aldhaheri, S., Alotaibi, R., Alzahrani, B., et al.: MACC net: multi-task attention crowd counting network. Appl. Intell. 53, 9285–9297 (2023). https://doi.org/10.1007/s10489-022-03954-x

    Article  Google Scholar 

  46. Wu, D., Fan, Z., Yi, S.: Crowd counting based on multi-level multi-scale feature. Appl. Intell. 53, 21891–21901 (2023). https://doi.org/10.1007/s10489-023-04641-1

    Article  Google Scholar 

  47. Zhang, L., Yan, L., Zhang, M., et al.: T2CNN: a novel method for crowd counting via two-task convolutional neural network. Vis. Comput. 39, 73–85 (2023)

    Article  Google Scholar 

  48. Shi, Y., Sang, J., Wu, Z., et al.: MGSNet: a multi-scale and gated spatial attention network for crowd counting. Appl. Intell. 52, 15436–15446 (2022)

    Article  Google Scholar 

  49. Li, P., Zhang, M., Wan, J., Jiang, M.: DMPNet: densely connected multi-scale pyramid networks for crowd counting. PeerJ Comput. Sci. 8, e902 (2022)

    Article  PubMed  PubMed Central  Google Scholar 

  50. Li, B., Zhang, Y., Xu, H., et al.: CCST: crowd counting with swin transformer. Vis. Comput. 39, 2671–2682 (2023). https://doi.org/10.1007/s00371-022-02485-3

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Nos.62067002 and 62062033), in part by the Science and Technology Project of Transportation Department of Jiangxi Province, China (No. 2022X0040), and in part by the Natural Science Foundation of Jiangxi Province under Grant 20232BAB202018.

Funding

The National Natural Science Foundation of China, 62067002, 62062033, The Science and Technology Project of Transportation Department of Jiangxi Province, 2022X0040, The Natural Science Foundation of Jiangxi Province, 20232BAB202018.

Author information

Authors and Affiliations

Authors

Contributions

LX and XH directed the structure of the model proposed in the paper, YZ and ZL performed the experiments, PH completed the plotting of Figures 6 to 9 of the paper, YZ completed the writing of the paper and the other pictures, and all authors reviewed the manuscript.

Corresponding author

Correspondence to Yijuan Zeng.

Ethics declarations

Conflict of interest

The authors declare that there are no competing interests related to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, L., Zeng, Y., Huang, X. et al. MLANet: multi-level attention network with multi-scale feature fusion for crowd counting. Cluster Comput (2024). https://doi.org/10.1007/s10586-024-04326-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-024-04326-5

Keywords

Navigation