Transformer-based rapid human pose estimation network

被引:6
|
作者
Wang, Dong [1 ]
Xie, Wenjun [2 ,3 ]
Cai, Youcheng [1 ]
Li, Xinjie [1 ]
Liu, Xiaoping [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[2] Hefei Univ Technol, Sch Software, Hefei 230009, Peoples R China
[3] Hefei Univ Technol, Anhui Prov Key Lab Ind Safety & Emergency Technol, Hefei 230601, Peoples R China
来源
COMPUTERS & GRAPHICS-UK | 2023年 / 116卷
关键词
Transformer architecture; Human pose estimation; Inference speed; Computational cost; ACTION RECOGNITION; SKELETON;
D O I
10.1016/j.cag.2023.09.001
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Most current human pose estimation methods pursue excellent performance via large models and intensive computational requirements, resulting in slower models. These methods cannot be effectively adopted for human pose estimation in real applications due to their high memory and computational costs. To achieve a trade-off between accuracy and efficiency, we propose TRPose, a Transformer-based network for human pose estimation rapidly. TRPose consists of an early convolutional stage and a later Transformer stage seamlessly. Concretely, the convolutional stage forms a Rapid Fusion Module (RFM), which efficiently acquires multi-scale features via three parallel convolution branches. The Transformer stage utilizes multi-resolution Transformers to construct a Dual scale Encoder Module (DEM), aiming at learning long-range dependencies from different scale features of the whole human skeletal keypoints. The experiments show that TRPose acquires 74.3 AP and 73.8 AP on COCO validation and testdev datasets with 170+ FPS on a GTX2080Ti, which achieves the better efficiency and effectiveness tradeoffs than most state-of-the-art methods. Our model also outperforms mainstream Transformer-based architectures on MPII dataset, yielding 89.9 PCK@0.5 score on val set without extra data. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:317 / 326
页数:10
相关论文
共 50 条
  • [1] A Transformer-Based Network for Full Object Pose Estimation with Depth Refinement
    Abdulsalam, Mahmoud
    Ahiska, Kenan
    Aouf, Nabil
    ADVANCED INTELLIGENT SYSTEMS, 2024, 6 (10)
  • [2] Vision Transformer-based pilot pose estimation
    Wu, Honglan
    Liu, Hao
    Sun, Youchao
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110
  • [3] AiPE: A Novel Transformer-Based Pose Estimation Method
    Lu, Kai
    Min, Dugki
    ELECTRONICS, 2024, 13 (05)
  • [4] Transformer-based weakly supervised 3D human pose estimation
    Wu, Xiao-guang
    Xie, Hu-jie
    Niu, Xiao-chen
    Wang, Chen
    Wang, Ze-lei
    Zhang, Shi-wen
    Shan, Yu-ze
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 109
  • [5] Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
    Li, Wenhao
    Liu, Mengyuan
    Liu, Hong
    Wang, Pichao
    Cai, Jialun
    Sebe, Nicu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 604 - 613
  • [6] Transformer-based 3D Human pose estimation and action achievement evaluation
    Yang, Aolei
    Zhou, Yinghong
    Yang, Banghua
    Xu, Yulin
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2024, 45 (04): : 136 - 144
  • [7] TPSFusion: A Transformer-based pyramid screening fusion network for 6D pose estimation
    Zhu, Jiaqi
    Li, Bin
    Zhao, Xinhua
    IMAGE AND VISION COMPUTING, 2025, 154
  • [8] A Transformer-based multi-modal fusion network for 6D pose estimation
    Hong, Jia-Xin
    Zhang, Hong-Bo
    Liu, Jing-Hua
    Lei, Qing
    Yang, Li-Jie
    Du, Ji-Xiang
    INFORMATION FUSION, 2024, 105
  • [9] Multi-hypothesis representation learning for transformer-based 3D human pose estimation
    Li, Wenhao
    Liu, Hong
    Tang, Hao
    Wang, Pichao
    PATTERN RECOGNITION, 2023, 141
  • [10] Human pose estimation in complex background videos via Transformer-based multi-scale feature integration
    Cheng, Chen
    Xu, Huahu
    DISPLAYS, 2024, 84