Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

被引:5
|
作者
Liu, Huan [1 ,3 ]
Chen, Qiang [2 ]
Tan, Zichang [2 ]
Liu, Jiang-Jiang [2 ]
Wang, Jian [2 ]
Su, Xiangbo [2 ]
Li, Xiaolong [1 ,3 ]
Yao, Kun [2 ]
Han, Junyu [2 ]
Ding, Errui [2 ]
Zhao, Yao [1 ,3 ]
Wang, Jingdong [2 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China
[2] Baidu VIS, Beijing, Peoples R China
[3] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.01380
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of end-to-end multiperson pose estimation. State-of-the-art solutions adopt the DETR-like framework, and mainly develop the complex decoder, e.g., regarding pose estimation as keypoint box detection and combining with human detection in ED-Pose [38], hierarchically predicting with pose decoder and joint (keypoint) decoder in PETR [27]. We present a simple yet effective transformer approach, named Group Pose. We simply regard K-keypoint pose estimation as predicting a set of N x K keypoint positions, each from a keypoint query, as well as representing each pose with an instance query for scoring N pose predictions. Motivated by the intuition that the interaction, among across-instance queries of different types, is not directly helpful, we make a simple modification to decoder selfattention. We replace single self-attention over all the N x (K + 1) queries with two subsequent group selfattentions: (i) N within-instance self-attention, with each over K keypoint queries and one instance query, and (ii) (K +1) same-type across-instance self-attention, each over N queries of the same type. The resulting decoder removes the interaction among across-instance type-different queries, easing the optimization and thus improving the performance. Experimental results on MS COCO and CrowdPose show that our approach without human box supervision is superior to previous methods with complex decoders, and even is slightly better than ED-Pose that uses human box supervision. Paddle 1 and PyTorch 2 codes are available.
引用
收藏
页码:14983 / 14992
页数:10
相关论文
共 50 条
  • [1] End-to-End Multi-Person Pose Estimation with Transformers
    Shi, Dahu
    Wei, Xing
    Li, Liangqi
    Ren, Ye
    Tan, Wenming
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11059 - 11068
  • [2] E2Pose: Fully Convolutional Networks for End-to-End Multi-Person Pose Estimation
    Tobeta, Masakazu
    Sawada, Yoshihide
    Zheng, Ze
    Takamuku, Sawa
    Natori, Naotake
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 532 - 537
  • [3] EFCPose: End-to-End Multi-Person Pose Estimation With Fully Convolutional Heads
    Wang, Haixin
    Zhou, Lu
    Chen, Yingying
    Chen, Zhiyang
    Tang, Ming
    Wang, Jinqiao
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6039 - 6050
  • [4] End-to-End Feature Pyramid Network for Real-Time Multi-Person Pose Estimation
    Luo, Dingli
    Du, Songlin
    Ikenaga, Takeshi
    [J]. PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
  • [5] PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
    Qiu, Zhongwei
    Yang, Qiansheng
    Wang, Jian
    Feng, Haocheng
    Han, Junyu
    Ding, Errui
    Xu, Chang
    Fu, Dongmei
    Wang, Jingdong
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21254 - 21263
  • [6] TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
    Reddy, N. Dinesh
    Guigues, Laurent
    Pishchulin, Leonid
    Eledath, Jayan
    Narasimhan, Srinivasa G.
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15185 - 15195
  • [7] Pose Knowledge Transfer for multi-person pose estimation
    Buwei Li
    Yi Ji
    Ying Li
    Yunlong Xu
    Chunping Liu
    [J]. Signal, Image and Video Processing, 2022, 16 : 321 - 328
  • [8] Pose Knowledge Transfer for multi-person pose estimation
    Li, Buwei
    Ji, Yi
    Li, Ying
    Xu, Yunlong
    Liu, Chunping
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (02) : 321 - 328
  • [9] Pose Partition Networks for Multi-person Pose Estimation
    Nie, Xuecheng
    Feng, Jiashi
    Xing, Junliang
    Yan, Shuicheng
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 705 - 720
  • [10] Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
    Li, Jia
    Su, Wen
    Wang, Zengfu
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11354 - 11361