A cross-feature interaction network for 3D human pose estimation

被引:0
|
作者
Peng, Jihua [1 ]
Zhou, Yanghong [3 ]
Mok, P. Y. [1 ,2 ,4 ,5 ]
机构
[1] Hong Kong Polytech Univ, Sch Fash & Text, Hong Kong, Peoples R China
[2] Lab Artificial Intelligence Design, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Res Ctr Text Future Fash, Hong Kong, Peoples R China
[4] Hong Kong Polytech Univ, Res Inst Sports Sci & Technol, Hong Kong, Peoples R China
[5] Hong Kong Univ Sci & Technol, Div Integrat Syst & Design, Hong Kong, Peoples R China
关键词
3D human pose estimation; graph convolutional network (GCN); self-attention; cross-attention;
D O I
10.1016/j.patrec.2025.01.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of estimating 3D human poses from single monocular images is challenging because, unlike video sequences, single images can hardly provide any temporal information for the prediction. Most existing methods attempt to predict 3D poses by modeling the spatial dependencies inherent in the anatomical structure of the human skeleton, yet these methods fail to capture the complex local and global relationships that exist among various joints. To solve this problem, we propose a novel Cross-Feature Interaction Network to effectively model spatial correlations between body joints. Specifically, we exploit graph convolutional networks (GCNs) to learn the local features between neighboring joints and the self-attention structure to learn the global features among all joints. We then design a cross-feature interaction (CFI) module to facilitate cross-feature communications among the three different features, namely the local features, global features, and initial 2D pose features, aggregating them to form enhanced spatial representations of human pose. Furthermore, a novel graph-enhanced module (GraMLP) with parallel GCN and multi-layer perceptron is introduced to inject the skeletal knowledge of the human body into the final representation of 3D pose. Extensive experiments on two datasets (Human3.6M (Ionescu et al., 2013) and MPI-INF-3DHP (Mehta et al., 2017)) show the superior performance of our method in comparison to existing state-of-the-art (SOTA) models. The code and data are shared at https://github.com/JihuaPeng/CFI-3DHPE
引用
收藏
页码:175 / 181
页数:7
相关论文
共 50 条
  • [31] Pose Guided RGBD Feature Learning for 3D Object Pose Estimation
    Balntas, Vassileios
    Doumanoglou, Andreas
    Sahin, Caner
    Sock, Juil
    Kouskouridas, Rigas
    Kim, Tae-Kyun
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3876 - 3884
  • [32] Human 3D sitting pose estimation based on contact interaction perception
    Zhou J.
    Cai J.
    Zhang L.
    Li L.
    Li X.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2022, 43 (11): : 132 - 141
  • [33] Bidirectional temporal feature for 3D human pose and shape estimation from a video
    Sun, Libo
    Tang, Ting
    Qu, Yuke
    Qin, Wenhu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [34] 3D head pose estimation without feature tracking
    Chen, Q
    Wu, HY
    Fukumoto, T
    Yachida, M
    AUTOMATIC FACE AND GESTURE RECOGNITION - THIRD IEEE INTERNATIONAL CONFERENCE PROCEEDINGS, 1998, : 88 - 93
  • [35] Capsule network with using shifted windows for 3D human pose estimation
    Liu, Xiufeng
    Zhao, Zhongqiu
    Tian, Weidong
    Liu, Binbin
    He, Hongmei
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 108
  • [36] Combination of Deep Learner Network and Transformer for 3D Human Pose Estimation
    Tien-Dat Tran
    Xuan-Thuy Vo
    Duy-Linh Nguyen
    Jo, Kang-Hyun
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 174 - 178
  • [37] An Articulated Structure-aware Network for 3D Human Pose Estimation
    Tang, Zhenhua
    Zhang, Xiaoyan
    Hou, Junhui
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 48 - 63
  • [38] 3D Human Pose Estimation=2D Pose Estimation plus Matching
    Chen, Ching-Hang
    Ramanan, Deva
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5759 - 5767
  • [39] GTIGNet: Global Topology Interaction Graphormer Network for 3D hand pose estimation
    Liu, Yanjun
    Fan, Wanshu
    Wang, Cong
    Wen, Shixi
    Yang, Xin
    Zhang, Qiang
    Wei, Xiaopeng
    Zhou, Dongsheng
    NEURAL NETWORKS, 2025, 185
  • [40] Occlusion Resilient 3D Human Pose Estimation
    Roy, Soumava Kumar
    Badanin, Ilia
    Honari, Sina
    Fua, Pascal
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1198 - 1207