当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SpATr: MoCap 3D human action recognition based on spiral auto-encoder and transformer network
Computer Vision and Image Understanding ( IF 4.5 ) Pub Date : 2024-02-17 , DOI: 10.1016/j.cviu.2024.103974
Hamza Bouzid , Lahoucine Ballihi

Recent technological advancements have significantly expanded the potential of human action recognition through harnessing the power of 3D data. This data provides a richer understanding of actions, including depth information that enables more accurate analysis of spatial and temporal characteristics. In this context, We study the challenge of 3D human action recognition. Unlike prior methods, that rely on sampling 2D depth images, skeleton points, or point clouds, often leading to substantial memory requirements and the ability to handle only short sequences, we introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network), specifically designed for fixed-topology mesh sequences. The SpATr model disentangles space and time in the mesh sequences. A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh. These convolutions are lightweight and specifically designed for fix-topology mesh data. Subsequently, a temporal transformer, based on self-attention, captures the temporal context within the feature sequence. The self-attention mechanism enables long-range dependencies capturing and parallel processing, enabling scalability for long sequences. The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub, from the Archive of Motion Capture As Surface Shapes (AMASS). Our results analysis demonstrates the competitive performance of our SpATr model in 3D human action recognition while maintaining efficient memory usage. The code and the training results are publicly available at .

中文翻译:

SpATr:基于螺旋自动编码器和变压器网络的MoCap 3D人体动作识别

最近的技术进步通过利用 3D 数据的力量,显着扩展了人类行为识别的潜力。这些数据提供了对动作的更丰富的理解,包括可以更准确地分析空间和时间特征的深度信息。在此背景下,我们研究了 3D 人体动作识别的挑战。之前的方法依赖于对 2D 深度图像、骨架点或点云进行采样,通常会导致大量的内存需求并且只能处理短序列,与之前的方法不同,我们引入了一种用于 3D 人体动作识别的新颖方法,表示为 SpATr(Spiral自动编码器和变压器网络),专为固定拓扑网格序列而设计。 SpATr 模型解开了网格序列中的空间和时间。采用基于螺旋卷积的轻量级自动编码器从每个 3D 网格中提取空间几何特征。这些卷积是轻量级的,专门为固定拓扑网格数据而设计。随后,基于自注意力的时间变换器捕获特征序列内的时间上下文。自注意力机制可以实现远程依赖性捕获和并行处理,从而实现长序列的可扩展性。所提出的方法在三个著名的 3D 人类动作数据集上进行了评估:Babel、MoVi 和 BMLrub,来自运动捕捉表面形状档案 (AMASS)。我们的结果分析证明了 SpATr 模型在 3D 人体动作识别方面的竞争性能,同时保持高效的内存使用。代码和训练结果可在 上公开获取。
更新日期:2024-02-17
down
wechat
bug