当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Temporal adaptive feature pyramid network for action detection
Computer Vision and Image Understanding ( IF 4.5 ) Pub Date : 2024-01-24 , DOI: 10.1016/j.cviu.2024.103945
Xuezhi Xiang , Hang Yin , Yulong Qiao , Abdulmotaleb El Saddik

Detecting actions in videos has become a prominent research task due to its wide application. In addition to recognizing action category, this task also needs to localize the start time and end time of each action instance, which requires the model to have high temporal modeling capability. Moreover, the duration between each action instance is often different and highly variable. Although previous works have made attempts to address this difficulty, it is still a persistent problem. To further address the difficulty, we propose an action detection network using temporal feature pyramid, which can collect data using cameras and predict precise action categories and localizations. Specifically, we introduce a temporal adaptive module, which mixes self-attention and 1D convolution to flexibly adjust the temporal receptive field to improve the temporal modeling ability for different actions. We also propose a channel adaptive module to adjust channel weights and suppress useless information. We then propose the Temporal Adaptive Feature Pyramid Network (TAFPN) by integrating the two modules to adaptively extract multi-scale temporal information. We also improve the traditional parallel head into a unified head by stacking channel adaptive modules to simplify the network structure. Experimental results on the THUMOS14 dataset and ActivityNet1.3 dataset show that our method is competitive with state-of-the-art methods, which proves the effectiveness of our method.



中文翻译:

用于动作检测的时间自适应特征金字塔网络

由于其广泛的应用,检测视频中的动作已成为一项重要的研究任务。除了识别动作类别外,该任务还需要定位每个动作实例的开始时间和结束时间,这要求模型具有较高的时间建模能力。此外,每个动作实例之间的持续时间通常是不同的并且变化很大。尽管之前的工作已经尝试解决这个困难,但这仍然是一个持续存在的问题。为了进一步解决这个困难,我们提出了一种使用时间特征金字塔的动作检测网络,它可以使用摄像头收集数据并预测精确的动作类别和定位。具体来说,我们引入了一个时间自适应模块,它混合了自注意力和一维卷积来灵活调整时间感受野,以提高不同动作的时间建模能力。我们还提出了一个通道自适应模块来调整通道权重并抑制无用信息。然后,我们通过集成这两个模块来提出时间自适应特征金字塔网络(TAFPN),以自适应地提取多尺度时间信息。我们还通过堆叠通道自适应模块将传统的并行头改进为统一头,以简化网络结构。THUMOS14数据集和ActivityNet1.3数据集上的实验结果表明,我们的方法与最先进的方法具有竞争力,这证明了我们方法的有效性。

更新日期:2024-01-28
down
wechat
bug