Learning Reliable Dense Pseudo-Labels for Point-Level Weakly-Supervised Action Localization,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Reliable Dense Pseudo-Labels for Point-Level Weakly-Supervised Action Localization
Neural Processing Letters ( IF 3.1 ) Pub Date : 2024-04-10 , DOI: 10.1007/s11063-024-11598-w
Yuanjie Dang , Guozhu Zheng , Peng Chen , Nan Gao , Ruohong Huan , Dongdong Zhao , Ronghua Liang

Point-level weakly-supervised temporal action localization aims to accurately recognize and localize action segments in untrimmed videos, using only point-level annotations during training. Current methods primarily focus on mining sparse pseudo-labels and generating dense pseudo-labels. However, due to the sparsity of point-level labels and the impact of scene information on action representations, the reliability of dense pseudo-label methods still remains an issue. In this paper, we propose a point-level weakly-supervised temporal action localization method based on local representation enhancement and global temporal optimization. This method comprises two modules that enhance the representation capacity of action features and improve the reliability of class activation sequence classification, thereby enhancing the reliability of dense pseudo-labels and strengthening the model’s capability for completeness learning. Specifically, we first generate representative features of actions using pseudo-label feature and calculate weights based on the feature similarity between representative features of actions and segments features to adjust class activation sequence. Additionally, we maintain the fixed-length queues for annotated segments and design a action contrastive learning framework between videos. The experimental results demonstrate that our modules indeed enhance the model’s capability for comprehensive learning, particularly achieving state-of-the-art results at high IoU thresholds.

中文翻译：

学习用于点级弱监督动作定位的可靠密集伪标签

点级弱监督时间动作定位旨在在训练期间仅使用点级注释来准确识别和定位未修剪视频中的动作片段。当前的方法主要集中在挖掘稀疏伪标签和生成密集伪标签。然而，由于点级标签的稀疏性以及场景信息对动作表示的影响，密集伪标签方法的可靠性仍然是一个问题。在本文中，我们提出了一种基于局部表示增强和全局时间优化的点级弱监督时间动作定位方法。该方法包括两个模块，分别增强动作特征的表示能力和提高类激活序列分类的可靠性，从而增强密集伪标签的可靠性并增强模型的完整性学习能力。具体来说，我们首先使用伪标签特征生成动作的代表性特征，并根据动作的代表性特征和片段特征之间的特征相似度计算权重，以调整类激活序列。此外，我们维护带注释片段的固定长度队列，并设计视频之间的动作对比学习框架。实验结果表明，我们的模块确实增强了模型的综合学习能力，特别是在高 IoU 阈值下取得了最先进的结果。

更新日期：2024-04-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>