DA-ResNet: dual-stream ResNet with attention mechanism for classroom video summary,Pattern Analysis and Applications

当前位置： X-MOL 学术 › Pattern Anal. Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DA-ResNet: dual-stream ResNet with attention mechanism for classroom video summary
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2024-03-14 , DOI: 10.1007/s10044-024-01256-1
Yuxiang Wu , Xiaoyan Wang , Tianpan Chen , Yan Dou

It is important to generate both diverse and representative video summary for massive videos. In this paper, a convolution neural network based on dual-stream attention mechanism(DA-ResNet) is designed to obtain candidate summary sequences for classroom scenes. DA-ResNet constructs a dual stream input of image frame sequence and optical flow frame sequence to enhance the expression ability. The network also embeds the attention mechanism into ResNet. On the other hand, the final video summary is obtained by removing redundant frames with the improved hash clustering algorithm. In this process, preprocessing is performed first to reduce computational complexity. And then hash clustering is used to retain the frame with the highest entropy value in each class, removing other similar frames. To verify its effectiveness in classroom scenes, we also created ClassVideo, a real dataset consisting of 45 videos from the normal teaching environment of our school. The results of the experiments show the competitiveness of the proposed method DA-ResNet outperforms the existing methods by about 8% in terms of the F-measure. Besides, the visual results also demonstrate its ability to produce classroom video summaries that are very close to the human preferences.

中文翻译：

DA-ResNet：具有注意力机制的双流ResNet，用于课堂视频摘要

为海量视频生成多样化且具有代表性的视频摘要非常重要。本文设计了一种基于双流注意力机制的卷积神经网络（DA-ResNet）来获取课堂场景的候选摘要序列。DA-ResNet构建图像帧序列和光流帧序列的双流输入，增强表达能力。该网络还将注意力机制嵌入到 ResNet 中。另一方面，利用改进的哈希聚类算法去除冗余帧，得到最终的视频摘要。在这个过程中，首先进行预处理以降低计算复杂度。然后使用哈希聚类保留每个类中熵值最高的帧，去除其他相似的帧。为了验证其在课堂场景中的有效性，我们还创建了 ClassVideo，这是一个由来自我们学校正常教学环境的 45 个视频组成的真实数据集。实验结果表明，所提出的方法DA-ResNet在F-measure方面优于现有方法约8%。此外，视觉结果还证明了其生成非常接近人类偏好的课堂视频摘要的能力。

更新日期：2024-03-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>