Skip to main content
Log in

Target–distractor memory joint tracking algorithm via Credit Allocation Network

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The tracking framework based on the memory network has gained significant attention due to its enhanced adaptability to variations in target appearance. However, the performance of the framework is limited by the negative effects of distractors in the background. Hence, this paper proposes a method for tracking using Credit Allocation Network to join target and distractor memory. Specifically, we design a Credit Allocation Network (CAN) that is updated online via Guided Focus Loss. The CAN produces credit scores for tracking results by learning features of the target object, ensuring the update of reliable samples for storage in the memory pool. Furthermore, we construct a multi-domain memory model that simultaneously captures target and background information from multiple historical intervals, which can build a more compatible object appearance model while increasing the diversity of the memory sample. Moreover, a novel target–distractor joint localization strategy is presented, which read target and distractor information from memory frames based on cross-attention, so as to cancel out wrong responses in the target response map by using the distractor response map. The experimental results on OTB-2015, GOT-10k, UAV123, LaSOT, and VOT-2018 datasets show the competitiveness and effectiveness of the proposed method compared to other trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Wu, Y., Lim, J., Yang, M.-H.: Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)

  2. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2019)

    Article  Google Scholar 

  3. Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., Leal-Taixé, L.: Mot20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)

  4. Cui, Y., Cao, Z., Xie, Y., Jiang, X., Tao, F., Chen, Y.V., Li, L., Liu, D.: DG-labeler and DGL-mots dataset: boost the autonomous driving perception. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 58–67 ( 2022)

  5. Liu, L., Xing, J., Ai, H., Ruan, X.: Hand posture recognition using finger geometric feature. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 565–568. IEEE (2012)

  6. Li, D., Qin, B., Liu, W., Deng, L.: A city monitoring system based on real-time communication interaction module and intelligent visual information collection system. Neural Process. Lett. 53, 2501–2517 (2021)

    Article  Google Scholar 

  7. Lee, K.-H., Hwang, J.-N.: On-road pedestrian tracking across multiple driving recorders. IEEE Trans. Multimedia 17(9), 1429–1438 (2015)

    Article  Google Scholar 

  8. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional Siamese networks for object tracking. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part II 14, pp. 850–865. Springer (2016)

  9. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 749–765 ( 2016). Springer

  10. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

  11. Yan, B., Zhao, H., Wang, D., Lu, H., Yang, X.: ’skimming-perusal’ tracking: a framework for real-time and robust long-term tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2385–2393 (2019)

  12. Li, X., Ma, C., Wu, B., He, Z., Yang, M.-H.: Target-aware deep tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1369–1378 (2019)

  13. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)

  14. Dai, K., Zhang, Y., Wang, D., Li, J., Lu, H., Yang, X.: High-performance long-term tracking with meta-updater. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6298–6307 (2020)

  15. Zhang, L., Gonzalez-Garcia, A., Weijer, J.V.D., Danelljan, M., Khan, F.S.: Learning the model update for siamese trackers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4010–4019 (2019)

  16. Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1763–1771 (2017)

  17. Yang, T., Chan, A.B.: Visual tracking via dynamic memory networks. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 360–374 (2019)

    Google Scholar 

  18. Pu, S., Song, Y., Ma, C., Zhang, H., Yang, M.-H.: Learning recurrent memory activation networks for visual tracking. IEEE Trans. Image Process. 30, 725–738 (2020)

    Article  Google Scholar 

  19. Fu, Z., Liu, Q., Fu, Z., Wang, Y.: STMTRACK: template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13774–13783 (2021)

  20. Choi, J., Kwon, J., Lee, K.: Visual tracking by reinforced decision making. retrieved from. arXiv 1702 (2017)

  21. Zhao, J., Dai, K., Zhang, P., Wang, D., Lu, H.: Robust online tracking with meta-updater. IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 6168–6182 (2022)

    Google Scholar 

  22. Zhao, H., Yan, B., Wang, D., Qian, X., Yang, X., Lu, H.: Effective local and global search for fast long-term tracking. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 460–474 (2022)

    Article  Google Scholar 

  23. Zhang, H., Ma, Z., Zhang, J., Chen, F., Song, X.: Multi-view confidence-aware method for adaptive Siamese tracking with shrink-enhancement loss. Pattern Anal. Appl. 1–18 (2023)

  24. Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

  25. Miller, A., Fisch, A., Dodge, J., Karimi, A.-H., Bordes, A., Weston, J.: Key-value memory networks for directly reading documents. arXiv preprint arXiv:1606.03126 (2016)

  26. Liu, F., Perez, J.: Gated end-to-end memory networks. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, Long Papers, pp. 1– 10 ( 2017)

  27. Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2010–2019 (2017)

  28. Li, G., Peng, M., Nai, K., Li, Z., Li, K.: Reliable correlation tracking via dual-memory selection model. Inf. Sci. 518, 238–255 (2020)

    Article  Google Scholar 

  29. Pu, S., Song, Y., Ma, C., Zhang, H., Yang, M.-H.: Learning recurrent memory activation networks for visual tracking. IEEE Trans. Image Process. 30, 725–738 (2020)

    Article  Google Scholar 

  30. Liu, B., Wang, Y., Tai, Y.-W., Tang, C.-K.: Mavot: memory-augmented video object tracking. arXiv preprint arXiv:1711.09414 (2017)

  31. Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 152–167 (2018)

  32. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)

  33. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). IEEE

  34. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)

  35. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)

  36. Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6578–6588 (2020)

  37. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  38. Du, F., Liu, P., Zhao, W., Tang, X.: Correlation-guided attention for corner detection based visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6836–6845 (2020)

  39. Yu, Y., Xiong, Y., Huang, W., Scott, M.R.: Deformable Siamese attention networks for visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6728–6737 (2020)

  40. Guo, D., Wang, J., Cui, Y., Wang, Z., Chen, S.: Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6269–6277 (2020)

  41. Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: Object-aware anchor-free tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 771–787 (2020). Springer

  42. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660–4669 (2019)

  43. Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12549–12556 (2020)

  44. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018)

  45. Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., Lu, H.: Gradnet: gradient-guided network for visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6162–6171 (2019)

  46. Zheng, L., Tang, M., Chen, Y., Wang, J., Lu, H.: Learning feature embeddings for discriminant model based tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, pp. 759–775 (2020). Springer

  47. Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Know your surroundings: exploiting scene information for object tracking. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 205–221 (2020). Springer

  48. Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., Shen, C.: Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2021)

  49. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6182–6191 (2019)

  50. Lukezic, A., Matas, J., Kristan, M.: D3s-a discriminative single shot segmentation tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7133–7142 (2020)

  51. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)

  52. Chen, Z., Zhong, B., Li, G., Zhang, S., Ji, R.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)

  53. Huang, L., Zhao, X., Huang, K.: Globaltrack: a simple and strong baseline for long-term tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11037–11044 (2020)

  54. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)

  55. Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)

  56. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 445–461 (2016). Springer

  57. Cheng, S., Zhong, B., Li, G., Liu, X., Tang, Z., Li, X., Wang, J.: Learning to filter: Siamese relation network for robust tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4421–4431 (2021)

  58. Bao, J., Chen, K., Sun, X., Zhao, L., Diao, W., Yan, M.: Siamthn: Siamese target highlight network for visual tracking. IEEE Trans. Circuits Syst. Video Technol. (2023)

  59. Ni, X., Yuan, L., Lv, K.: Efficient single-object tracker based on local-global feature fusion. IEEE Trans. Circuits Syst. Video Technol. (2023)

  60. Gu, F., Lu, J., Cai, C., Zhu, Q., Ju, Z.: Eantrack: an efficient attention network for visual tracking. IEEE Trans. Autom. Sci. Eng. (2023)

  61. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., et al.: The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0– 0 ( 2018)

  62. Xu, T., Feng, Z., Wu, X.-J., Kittler, J.: Toward robust visual object tracking with independent target-agnostic detection and effective Siamese cross-task interaction. IEEE Trans. Image Process. 32, 1541–1554 (2023)

    Article  Google Scholar 

  63. Nai, K., Chen, S.: Learning a novel ensemble tracker for robust visual tracking. IEEE Trans. Multimedia (2023)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant (62272423, 62072416, 62102373, 62006213), Program for Science & Technology Innovation Talents in Universities of Henan Province, China (21HASTIT028), Natural Science Foundation of Henan Province, China (202300410495), Zhongyuan Science and Technology Innovation Leadership Program, China (214200510026).

Funding

National Natural Science Foundation of China under Grant (62272423, 62072416, 62102373, 62006213), Program for Science & Technology Innovation Talents in Universities of Henan Province, China (21HASTIT028), Natural Science Foundation of Henan Province, China (202300410495), Zhongyuan Science and Technology Innovation Leadership Program, China (214200510026).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization was performed by HZ and PW; methodology by HZ and ZC; implementation of the computer code and support by JZ; investigation and formal analysis by LL; writing—original draft preparation by PW; writing—review and editing by all authors.

Corresponding author

Correspondence to Zhiwu Chen.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Wang, P., Chen, Z. et al. Target–distractor memory joint tracking algorithm via Credit Allocation Network. Machine Vision and Applications 35, 28 (2024). https://doi.org/10.1007/s00138-024-01508-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01508-4

Keywords

Navigation