Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

被引:0
|
作者
Fu, Qichen [1 ]
Liu, Xingyu [1 ]
Xu, Ran [2 ]
Niebles, Juan Carlos [2 ]
Kitani, Kris M. [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Salesforce Res, San Francisco, CA USA
关键词
D O I
10.1109/ICCV51070.2023.02157
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accurately estimating 3D hand pose is crucial for understanding how humans interact with the world. Despite remarkable progress, existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred. In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame. To adaptively leverage the visual clue before and after the occlusion or blurring for robust hand pose estimation, we propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image (spatial dimension) and different timesteps (temporal dimension). We show that a naive application of the transformer self-attention mechanism is not sufficient because motion blur or occlusions in certain frames can lead to heavily distorted hand features and generate imprecise keys and queries. To address this challenge, we incorporate a Dynamic Fusion Module into Deformer, which predicts the deformation of the hand and warps the hand mesh predictions from nearby frames to explicitly support the current frame estimation. Furthermore, we have observed that errors are unevenly distributed across different hand parts, with vertices around fingertips having disproportionately higher errors than those around the palm. We mitigate this issue by introducing a new loss function called maxMSE that automatically adjusts the weight of every vertex to focus the model on critical hand parts. Extensive experiments show that our method significantly outperforms state-of-the-art methods by 10%, and is more robust to occlusions (over 14%).
引用
收藏
页码:23543 / 23554
页数:12
相关论文
共 50 条
  • [1] A Robust Hand Pose Estimation Algorithm for Hand Rehabilitation
    Cordella, Francesca
    Di Corato, Francesco
    Zollo, Loredana
    Siciliano, Bruno
    NEW TRENDS IN IMAGE ANALYSIS AND PROCESSING - ICIAP 2013, 2013, 8158 : 1 - 10
  • [2] ON THE FUSION OF RGB AND DEPTH INFORMATION FOR HAND POSE ESTIMATION
    Kazakos, Evangelos
    Nikou, Christophoros
    Kakadiaris, Ioannis A.
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 868 - 872
  • [3] Dynamic Projected Segmentation Networks For Hand Pose Estimation
    Che, Yunlong
    Qi, Yue
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 477 - 482
  • [4] Robust Hand Pose Estimation during the Interaction with an Unknown Object
    Choi, Chiho
    Yoon, Sang Ho
    Chen, Chin-Ning
    Ramani, Karthik
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3142 - 3151
  • [5] PoseFusion: Robust Object-in-Hand Pose Estimation with SelectLSTM
    Tu, Yuyang
    Jiang, Junnan
    Li, Shuang
    Hendrich, Norman
    Li, Miao
    Zhang, Jianwei
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 6839 - 6846
  • [6] MLE-Loss Driven Robust Hand Pose Estimation
    Lou, Xudong
    Lin, Xin
    Zhu, Xiangxian
    Chen, Chen
    IEEE ACCESS, 2024, 12 : 99794 - 99805
  • [7] Efficient Multimodal Fusion for Hand Pose Estimation With Hourglass Network
    Hoang, Dinh-Cuong
    Xuan Tan, Phan
    Pham, Duc-Long
    Pham, Hai-Nam
    Bui, Son-Anh
    Nguyen, Chi-Minh
    Phi, An-Binh
    Tran, Khanh-Duong
    Trinh, Viet-Anh
    Tran, van-Duc
    Tran, Duc-Thanh
    Duong, van-Hiep
    Phan, Khanh-Toan
    Nguyen, van-Thiep
    Vu, van-Duc
    Nguyen, Thu-Uyen
    IEEE ACCESS, 2024, 12 : 113810 - 113825
  • [8] InferTrans: Hierarchical structural fusion transformer for crowded human pose estimation
    Li, Muyu
    Wang, Yingfeng
    Hu, Henan
    Zhao, Xudong
    Information Fusion, 2025, 117
  • [9] Robust vehicle pose estimation from vision and INS fusion
    Bersani, Mattia
    Mentasti, Simone
    Cudrano, Paolo
    Vignati, Michele
    Matteucci, Matteo
    Cheli, Federico
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [10] Robust Pose Estimation for Outdoor Mixed Reality with Sensor Fusion
    Zhou, ZhiYing
    Karlekar, Jayashree
    Hii, Daniel
    Schneider, Miriam
    Li, Weiquan
    Wittkopf, Stephen
    UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT III, 2009, 5616 : 281 - 289