Video Summarization Using Knowledge Distillation-Based Attentive Network

Qin, Jialin; Yu, Hui; Liang, Wei; Ding, Derui

doi:10.1007/s12559-023-10243-3

Video Summarization Using Knowledge Distillation-Based Attentive Network

Published: 11 January 2024

(2024)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Jialin Qin¹,
Hui Yu^1,2,
Wei Liang¹ &
…
Derui Ding¹

161 Accesses
Explore all metrics

Abstract

The vast volumes of videos produced daily require highly efficient measures to ensure that key information is reported for effective review and storage, which leads to the popularity of video summarization techniques. Deep learning has shown its advantages in video summarization, especially convolutional neural network, which are effective in extracting features for video summarization. However, the deep network layers and the limited range of temporal dependence make it challenging to deploy the network and thus affect the accuracy of identifying important video frames. To tackle these issues, we present a knowledge distillation-based attentive network (KDAN) for supervised video summarization in this paper. The proposed method separates the full convolutional network from the attention mechanism based on the idea of education and learning processes in biology and uses a full convolutional network as a teacher network to guide the learning of the student network consisting of an attention mechanism. The obtained lightweight network considers the knowledge learned from both networks, thus solving the problems of explosion in the number of participants and slow training. We have conducted experiments on two widely used benchmarks SumMe and TVSum. DANtea achieves F-scores 53.09 and 60.30, and DAN achieves F-scores 51.26 and 61.55 in Canonical settings on the SumMe and TVSum datasets, respectively. Experiments on two public benchmarks SumMe and TVSum demonstrate the effectiveness and superiority of the proposed network over existing state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in the creative industries: a review

Article Open access 02 July 2021

Recommendation system based on deep learning methods: a systematic review and new directions

Article 03 August 2019

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Data Availability

Underlying data experiments in this study were conducted using the publicly available datasets. The datasets used in this study are available on https://github.com/KaiyangZhou/pytorch-vsumm-reinforce

References

Chen H, Ding G, Lin Z, Guo Y, Shan C, Han J. Image captioning with memorized knowledge. Cognit Comput. 2021;13(4):807–20.
Article Google Scholar
Mei S, Guan G, Wang Z, Wan S, He M, Feng DD. Video summarization via minimum sparse reconstruction. Pattern Recognit. 2015;48(2):522–33.
Article Google Scholar
Zhang K, Chao WL, Sha F, Grauman K. Video summarization with long short-term memory. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 2016;9911 LNCS:766–782. https://doi.org/10.1007/978-3-319-46478-7_47.
Elhamifar E, Sapiro G, Sastry SS. Dissimilarity-based sparse subset selection. IEEE Trans Pattern Anal Mach Intell. 2015;38(11):2182–97.
Article Google Scholar
Mitra A, Biswas S, Bhattacharyya C. Bayesian modeling of temporal coherence in videos for entity discovery and summarization. IEEE Trans Pattern Anal Mach Intell. 2016;39(3):430–43.
Article Google Scholar
Fajtl J, Sokeh HS, Argyriou V, Monekosso D, Remagnino P. Summarizing videos with attention. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 2019;11367 LNCS:39–54. https://doi.org/10.1007/978-3-030-21074-8_4.
Ji Z, Xiong K, Pang Y, Member S, Li X. Video summarization with attention-based encoder–decoder networks. 2020;30(6):1709–1717.
Zhou K, Qiao Y, Xiang T. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. 32nd AAAI Conf. Artif Intell AAAI. 2018;2018:7582–9.
Google Scholar
Muhammad K, Hussain T, Baik SW. Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recognit Lett. 2020;130:370–5.
Article Google Scholar
Li Z, Yang L. Weakly supervised deep reinforcement learning for video summarization with semantically meaningful reward. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021;3239–3247.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Article Google Scholar
Rochan M, Ye L, Wang Y. Video summarization using fully convolutional sequence networks. In Proceedings of the European conference on computer vision (ECCV). 2018;347–363.
Jian M, Wang J, Yu H, Wang G-G. Integrating object proposal with attention networks for video saliency detection. Inf Sci (Ny). 2021;576:819–30.
Article MathSciNet Google Scholar
Li X, Liu Y, Wang K, Wang F-Y. A recurrent attention and interaction model for pedestrian trajectory prediction. IEEE/CAA J Autom Sin. 2020;7(5):1361–70.
Google Scholar
Zhu W, Lu J, Han Y, Zhou J. Learning multiscale hierarchical attention for video summarization. Pattern Recognit. 2022;122: 108312. https://doi.org/10.1016/j.patcog.2021.108312.
Article Google Scholar
Li X, Li M, Yan P, et al. Deep learning attention mechanism in medical image analysis: basics and beyonds. International Journal of Network Dynamics and Intelligence. 2023;2(1):93–116.
Article Google Scholar
Niu Z, Zhong G, Yu H. A review on the attention mechanism of deep learning. Neurocomputing. 2021;452:48–62.
Article Google Scholar
Lindsay GW. Convolutional neural networks as a model of the visual system: past, present, and future. J Cogn Neurosci. 2021;33(10):2017–31.
Article Google Scholar
Spratling MW, Johnson MH. A feedback model of visual attention. J Cogn Neurosci. 2004;16(2):219–37.
Article Google Scholar
Chen L-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. 2017. arXiv Prepr. arXiv1706.05587.
De Schotten MT. et al. A lateralized brain network for visuo-spatial attention. Nat Preced. 2011;1.
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV). 2018;286–301.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2018;7132–7141.
Liang G, Lv Y, Li S, Zhang S, Zhang Y. Unsupervised video summarization with a convolutional attentive adversarial network. 2021;1–26. [Online]. Available: http://arxiv.org/abs/2105.11131.
Gupta D, Sharma A. Attentive convolution network-based video summarization. 2021;778. Springer Singapore. https://doi.org/10.1007/978-981-16-3067-5_25.
Gygli M, Grabner H, Riemenschneider H, Van Gool L. Creating summaries from user videos. In European conference on computer vision. 2014;505–520.
Song Y, Vallmitjana J, Stent A, Jaimes A. Tvsum: summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2015;5179–5187.
Ye F, Zhang S, Wang P, Chan C-Y. A survey of deep reinforcement learning algorithms for motion planning and control of autonomous vehicles. In IEEE Intelligent Vehicles Symposium (IV). 2021;2021:1073–80.
Google Scholar
Yue W, Wang Z, Zhang J, Liu X. An overview of recommendation techniques and their applications in healthcare. IEEE/CAA J Autom Sin. 2021;8(4):701–17.
Article MathSciNet Google Scholar
Yan X, Hu S, Mao Y, Ye Y, Yu H. Deep multi-view learning methods: a review. Neurocomputing. 2021;448:106–29.
Article Google Scholar
Cheng H, Wang Z, Wei Z, Ma L, Liu X. On adaptive learning framework for deep weighted sparse autoencoder: a multiobjective evolutionary algorithm. IEEE Trans Cybern. 2020.
Liao J, Lam HK, Gulati S, et al. Improved computer-aided diagnosis system for nonerosive reflux disease using contrastive self-supervised learning with transfer learning. International Journal of Network Dynamics and Intelligence. 2023;2(3): 100010.
Article Google Scholar
Chen Y, Tao L, Wang X, Yamasaki T. Weakly supervised video summarization by hierarchical reinforcement learning. In Proceedings of the ACM Multimedia Asia. 2019;1–6.
Mahasseni B, Lam M, Todorovic S. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017;202–211.
Su M, Ma R, Zhang B, Li K. Recurrent unit augmented memory network for video summarisation. IET Comput Vis. 2023.
Yao T, Mei T, Rui Y. Highlight detection with pairwise deep ranking for first-person video summarization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2016;982–990.
Zhao B, Li X, Lu X. Hierarchical recurrent neural network for video summarization. In Proceedings of the 25th ACM international conference on Multimedia. 2017;863–871.
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. 2015;2(7) arXiv Prepr. arXiv1503.02531.
Chen G, Choi W, Yu X, Han T, Chandraker M. Learning efficient object detection models with knowledge distillation. Adv Neural Inf Process Syst. 2017;30.
Zhang Z, Zhu X, Ye M. Fast human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019;3517–3526.
Meng Z, Li J, Zhao Y, Gong Y. Conditional teacher-student learning. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019;6445–6449.
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y. Fitnets: hints for thin deep nets. 2014. arXiv Prepr. arXiv1412.6550.
Yim J, Joo D, Bae J, Kim J. A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2017;4133–4141.
Potapov D, Douze M, Harchaoui Z, Schmid C. Category-specific video summarization. In European conference on computer vision. 2014;540–555.
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2018;7794–7803.
Szegedy C, et al. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2015;1–9.
Russakovsky O, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–52.
Article MathSciNet Google Scholar
DKingma DP, Ba J. Adam: a method for stochastic optimization. ICLR. 2015. 2015;9. arXiv Prepr. arXiv1412.6980.
Wang J, Wang W, Wang Z, Wang L, Feng D, Tan T. Stacked memory network for video summarization. In Proceedings of the 27th ACM International Conference on Multimedia. 2019;836–844.
Ji Z, Xiong K, Pang Y, Li X. Video summarization with attention-based encoder–decoder networks. IEEE Trans Circuits Syst Video Technol. 2019;30(6):1709–17.
Article Google Scholar
Zhao B, Li H, Lu X, Li X. Reconstructive sequence-graph network for video summarization. IEEE Trans Pattern Anal Mach Intell. 2021;8828:1–10. https://doi.org/10.1109/TPAMI.2021.3072117.
Liu T, Meng Q, Huang J-J, Vlontzos A, Rueckert D, Kainz B. Video summarization through reinforcement learning with a 3D spatio-temporal u-net. IEEE Trans Image Process. 2022;31:1573–86.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Control Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Jialin Qin, Hui Yu, Wei Liang & Derui Ding
School of Creative Technologies, University of Portsmouth, Portsmouth, PO1 2DJ, UK
Hui Yu

Authors

Jialin Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Derui Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Yu.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qin, J., Yu, H., Liang, W. et al. Video Summarization Using Knowledge Distillation-Based Attentive Network. Cogn Comput (2024). https://doi.org/10.1007/s12559-023-10243-3

Download citation

Received: 04 September 2022
Accepted: 18 December 2023
Published: 11 January 2024
DOI: https://doi.org/10.1007/s12559-023-10243-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video Summarization Using Knowledge Distillation-Based Attentive Network

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in the creative industries: a review

Recommendation system based on deep learning methods: a systematic review and new directions

Video summarization using deep learning techniques: a detailed analysis and investigation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video Summarization Using Knowledge Distillation-Based Attentive Network

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in the creative industries: a review

Recommendation system based on deep learning methods: a systematic review and new directions

Video summarization using deep learning techniques: a detailed analysis and investigation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation