MDJ: A multi-scale difference joint keyframe extraction algorithm for infrared surveillance video action recognition,Digital Signal Processing

当前位置： X-MOL 学术 › Digit. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MDJ: A multi-scale difference joint keyframe extraction algorithm for infrared surveillance video action recognition
Digital Signal Processing ( IF 2.9 ) Pub Date : 2024-03-15 , DOI: 10.1016/j.dsp.2024.104469
Zhiqiang Feng , Xiaogang Wang , Jiayi Zhou , Xin Du

Many action recognition methods require significant computational resources to achieve good results on unedited videos. However, their performance on infrared videos, which contain less information, is often unsatisfactory. In this paper, we propose a multi-scale difference joint key frame extraction algorithm for action recognition in infrared surveillance videos. To evaluate our algorithm, we have created a self-built dataset comprising 1200 unedited infrared surveillance videos categorized into 10 action categories. Our algorithm extracts key frames by jointly analyzing the global and local differences between adjacent frames. Experimental analysis demonstrates that by using only 10 frames, our algorithm improves the accuracy of generic action recognition algorithms by more than 10% on both our self-built dataset and the Infrared-Visible dataset. Moreover, our proposed method achieves high recognition accuracy with minimal computational cost, even when using a small number of frames. It outperforms state-of-the-art methods by 1.82% on the Infrared-Visible dataset and 1.35% on the InfAR dataset. These results highlight the effectiveness of our algorithm as a preprocessing module to significantly enhance the accuracy of action recognition algorithms prior to employing generic models.

中文翻译：

MDJ：一种用于红外监控视频动作识别的多尺度差分联合关键帧提取算法

许多动作识别方法需要大量的计算资源才能在未经编辑的视频上取得良好的结果。然而，它们在包含较少信息的红外视频上的表现往往不能令人满意。在本文中，我们提出了一种用于红外监控视频动作识别的多尺度差分联合关键帧提取算法。为了评估我们的算法，我们创建了一个自建数据集，其中包含 1200 个未经编辑的红外监控视频，分为 10 个动作类别。我们的算法通过联合分析相邻帧之间的全局和局部差异来提取关键帧。实验分析表明，仅使用 10 帧，我们的算法在自建数据集和红外-可见光数据集上将通用动作识别算法的准确率提高了 10% 以上。此外，我们提出的方法即使在使用少量帧时也能以最小的计算成本实现高识别精度。它在红外-可见光数据集上比最先进的方法高出 1.82%，在 InfAR 数据集上比最先进的方法高出 1.35%。这些结果凸显了我们的算法作为预处理模块的有效性，可以在采用通用模型之前显着提高动作识别算法的准确性。

更新日期：2024-03-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>