PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation

被引：0

作者：

Zhu, Yiran ^{[1
]}

Xu, Xing ^{[1
]}

Shen, Fumin ^{[1
]}

Ji, Yanli ^{[1
]}

Gao, Lianli ^{[1
]}

Shen, Heng Tao ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China

来源：

PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utilize the restricted receptive fields of filters and single-scale information, while neglecting the valuable multiscale contextual information. To tackle this issue, we propose a novel model named Graph Transformer Encoder-Decoder with Atrous Convolution (PoseGTAC), to effectively extract multi-scale context and long-range information. Specifically, our PoseGTAC model has two key components: Graph Atrous Convolution (GAC) and Graph Transformer Layer (GTL), which are respectively for the extraction of local multi-scale and global long-range information. They are combined and stacked in an encoder-decoder structure, where graph pooling and unpooling are adopted for the interaction of multi-scale information from local to global aspect (e.g., part-scale and body-scale). Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that the proposed PoseGTAC model achieves state-of-the-art performance.

引用

页码：1359 / 1365

页数：7

共 50 条

[41] CED-Net: contextual encoder-decoder network for 3D face reconstruction
Zhu, Lei
Wang, Shanmin
Zhao, Zengqun
Xu, Xiang
Liu, Qingshan
MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1713 - 1722
[42] Multimodal 3D medical image registration guided by shape encoder-decoder networks
Blendowski, Max
Bouteldja, Nassim
Heinrich, Mattias P.
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2020, 15 (02) : 269 - 276
[43] HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds
Zhang, Gang
Chen, Junnan
Gao, Guohuan
Li, Jianmin
Hu, Xiaolin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[44] Combination of Deep Learner Network and Transformer for 3D Human Pose Estimation
Tien-Dat Tran
Xuan-Thuy Vo
Duy-Linh Nguyen
Jo, Kang-Hyun
2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 174 - 178
[45] A NOVEL TWO-PATHWAY ENCODER-DECODER NETWORK FOR 3D FACE RECONSTRUCTION
Li, Xianfeng
Weng, Zichun
Liang, Juntao
Cai, Lei
Xiang, Youjun
Fu, Yuli
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3682 - 3686
[46] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
Ma, Haifeng
Ke Lu
Xue, Jian
Niu, Zehai
Gao, Pengcheng
2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
[47] 3D human pose estimation with multi-hypotheses gated transformer
Dong, Xiena
Zhang, Jian
Yu, Jun
Yu, Ting
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[48] Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation
Li, Wenhao
Liu, Hong
Ding, Runwei
Liu, Mengyuan
Wang, Pichao
Yang, Wenming
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1282 - 1293
[49] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
Wang, Yong
Kang, Hongbo
Wu, Doudou
Yang, Wenming
Zhang, Longbin
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049
[50] A Hierarchical Static-Dynamic Encoder-Decoder Structure for 3D Human Motion Prediction with Residual CNNs
Tang, Jin
Liu, Jin
Yin, JianQin
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020

← 1 2 3 4 5 →