SPGformer: Serial–Parallel Hybrid GCN-Transformer With Graph-Oriented Encoder for 2-D-to-3-D Human Pose Estimation,IEEE Transactions on Instrumentation and Measurement

当前位置： X-MOL 学术 › IEEE Trans. Instrum. Meas. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SPGformer: Serial–Parallel Hybrid GCN-Transformer With Graph-Oriented Encoder for 2-D-to-3-D Human Pose Estimation
IEEE Transactions on Instrumentation and Measurement ( IF 5.6 ) Pub Date : 2024-03-25 , DOI: 10.1109/tim.2024.3381701
Qin Fang ₁ , Zihan Xu ₂ , Mengxian Hu ₂ , Qinyang Zeng ₂ , Chengju Liu ₂ , Qijun Chen ₂

Affiliation

Accurate acquisition of 3-D human joint poses holds significant implications for tasks such as human action recognition. Monocular single-frame 2-D -to-3-D pose estimation focuses on establishing the correspondence between 2-D human pose in a single image and their 3-D spatial pose, delegating the preliminary task of 2-D pose estimation to models better suited for processing pixel information. The intricacy of 2-D -to-3-D pose estimation resides in modeling the spatial constraints among joints. To better learn the structure between joints, this article proposes the SPGformer algorithm, constructed with stacked serial–parallel GCN-encoder (SPGEncoder) modules. This module forms a dual-branch framework composed of transformer encoders (Encoders) and graph-oriented encoders (GraEncoders). We recover concealed depth values from the 2-D coordinates of joints, inputting them into the joint branch of the SPGEncoder. In parallel, we take the connection features of joints in the image as vector branch input. The proposed GraEncoder module integrates a learnable graph convolutional network (GCN) prior to the Encoder, enabling the learning of a broader spectrum of joint connections within the confines of skeletal linkage. Furthermore, this article presents a methodology for calculating the 3-D absolute pose of the root node, filling a research gap for applications requiring precise human position. This nonlearnable, plug-and-play method has been validated on the Human3.6M dataset. The SPGformer algorithm outperforms state-of-the-art methods on both the Human3.6M and MPI-INF-3DHP datasets.

中文翻译：

SPGformer：具有面向图形编码器的串行并行混合 GCN 转换器，用于 2D 到 3D 人体姿势估计

准确获取 3D 人体关节姿势对于人体动作识别等任务具有重要意义。单目单帧 2-D 到 3-D 姿态估计侧重于建立单个图像中的 2-D 人体姿态与其 3-D 空间姿态之间的对应关系，将 2-D 姿态估计的初步任务委托给模型更适合处理像素信息。 2D 到 3D 姿态估计的复杂性在于对关节之间的空间约束进行建模。为了更好地学习关节之间的结构，本文提出了 SPGformer 算法，该算法由堆叠串行并行 GCN 编码器（SPGEncoder）模块构建。该模块形成了由变压器编码器（Encoders）和面向图的编码器（GraEncoders）组成的双分支框架。我们从关节的二维坐标中恢复隐藏的深度值，并将它们输入到 SPGEncoder 的关节分支中。同时，我们将图像中关节的连接特征作为向量分支输入。所提出的 GraEncoder 模块在编码器之前集成了可学习的图卷积网络 (GCN)，从而能够在骨骼链接的范围内学习更广泛的关节连接。此外，本文提出了一种计算根节点 3D 绝对姿态的方法，填补了需要精确人体位置的应用的研究空白。这种不可学习的即插即用方法已在 Human3.6M 数据集上得到验证。 SPGformer 算法在 Human3.6M 和 MPI-INF-3DHP 数据集上均优于最先进的方法。

更新日期：2024-03-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>