PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation

被引:0
|
作者
Zhu, Yiran [1 ]
Xu, Xing [1 ]
Shen, Fumin [1 ]
Ji, Yanli [1 ]
Gao, Lianli [1 ]
Shen, Heng Tao [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utilize the restricted receptive fields of filters and single-scale information, while neglecting the valuable multiscale contextual information. To tackle this issue, we propose a novel model named Graph Transformer Encoder-Decoder with Atrous Convolution (PoseGTAC), to effectively extract multi-scale context and long-range information. Specifically, our PoseGTAC model has two key components: Graph Atrous Convolution (GAC) and Graph Transformer Layer (GTL), which are respectively for the extraction of local multi-scale and global long-range information. They are combined and stacked in an encoder-decoder structure, where graph pooling and unpooling are adopted for the interaction of multi-scale information from local to global aspect (e.g., part-scale and body-scale). Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that the proposed PoseGTAC model achieves state-of-the-art performance.
引用
收藏
页码:1359 / 1365
页数:7
相关论文
共 50 条
  • [41] CED-Net: contextual encoder-decoder network for 3D face reconstruction
    Zhu, Lei
    Wang, Shanmin
    Zhao, Zengqun
    Xu, Xiang
    Liu, Qingshan
    MULTIMEDIA SYSTEMS, 2022, 28 (05) : 1713 - 1722
  • [42] Multimodal 3D medical image registration guided by shape encoder-decoder networks
    Blendowski, Max
    Bouteldja, Nassim
    Heinrich, Mattias P.
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2020, 15 (02) : 269 - 276
  • [43] HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds
    Zhang, Gang
    Chen, Junnan
    Gao, Guohuan
    Li, Jianmin
    Hu, Xiaolin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Combination of Deep Learner Network and Transformer for 3D Human Pose Estimation
    Tien-Dat Tran
    Xuan-Thuy Vo
    Duy-Linh Nguyen
    Jo, Kang-Hyun
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 174 - 178
  • [45] A NOVEL TWO-PATHWAY ENCODER-DECODER NETWORK FOR 3D FACE RECONSTRUCTION
    Li, Xianfeng
    Weng, Zichun
    Liang, Juntao
    Cai, Lei
    Xiang, Youjun
    Fu, Yuli
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3682 - 3686
  • [46] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
    Ma, Haifeng
    Ke Lu
    Xue, Jian
    Niu, Zehai
    Gao, Pengcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [47] 3D human pose estimation with multi-hypotheses gated transformer
    Dong, Xiena
    Zhang, Jian
    Yu, Jun
    Yu, Ting
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [48] Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation
    Li, Wenhao
    Liu, Hong
    Ding, Runwei
    Liu, Mengyuan
    Wang, Pichao
    Yang, Wenming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1282 - 1293
  • [49] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
    Wang, Yong
    Kang, Hongbo
    Wu, Doudou
    Yang, Wenming
    Zhang, Longbin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049
  • [50] A Hierarchical Static-Dynamic Encoder-Decoder Structure for 3D Human Motion Prediction with Residual CNNs
    Tang, Jin
    Liu, Jin
    Yin, JianQin
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020