Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning

被引:1
|
作者
Hoang, Dinh-Cuong [1 ]
Tan, Phan Xuan [2 ]
Nguyen, Anh-Nhat [1 ]
Vu, Duy-Quang [1 ]
Vu, Van-Duc [1 ]
Nguyen, Thu-Uyen [1 ]
Hoang, Ngoc-Anh [1 ]
Phan, Khanh-Toan [1 ]
Tran, Duc-Thanh [1 ]
Nguyen, Van-Thiep [1 ]
Duong, Quang-Tri [1 ]
Ho, Ngoc-Trung [1 ]
Tran, Cong-Trinh [1 ]
Duong, Van-Hiep [1 ]
Ngo, Phuc-Quan [1 ]
机构
[1] FPT Univ, IT Dept, Hanoi 10000, Vietnam
[2] Shibaura Inst Technol, Coll Engn, Koto City, Tokyo 1358548, Japan
关键词
Feature extraction; Three-dimensional displays; Shape; Pose estimation; Image color analysis; Task analysis; Solid modeling; robot vision systems; intelligent systems; deep learning; supervised learning; machine vision;
D O I
10.1109/ACCESS.2024.3388870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hand-object configuration recovery is an important task in computer vision. The estimation of pose and shape for both hands and objects during interactive scenarios has various applications, particularly in augmented reality, virtual reality, or imitation-based robot learning. The problem is particularly challenging when the hand is interacting with objects in the environment, as this setting features both extreme occlusions and non-trivial shape deformations. While existing works treat the problem of estimating hand configurations (that is pose and shape parameters) in isolation from the recovery of parameters related to the object acted upon, we stipulate that the two problems are related and can be solved more accurately concurrently. We introduce an approach that jointly learns the features of hand and object from color and depth (RGB-D) images. Our approach fuses appearance and geometric features in an adaptive manner which allows us to accent or suppress features that are more meaningful for the upstream task of hand-object configuration recovery. We combine a deep Hough voting strategy that builds on our adaptive features with a graph convolutional network (GCN) to learn the interaction relationships between the hand and held object shapes during interaction. Experimental results demonstrate that our proposed approach consistently outperforms state-of-the-art methods on popular datasets.
引用
收藏
页码:54339 / 54351
页数:13
相关论文
共 50 条
  • [1] Multi-Level Fusion Net for hand pose estimation in hand-object interaction
    Lin, Xiang-Bo
    Zhou, Yi-Dan
    Du, Kuo
    Sun, Yi
    Ma, Xiao-Hong
    Lu, Jian
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 94
  • [2] Harmonious Feature Learning for Interactive Hand-Object Pose Estimation
    Lin, Zhifeng
    Ding, Changxing
    Yao, Huan
    Kuang, Zengsheng
    Huang, Shaoli
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 12989 - 12998
  • [3] Hand Pose Estimation for Hand-Object Interaction Cases using Augmented Autoencoder
    Li, Shile
    Wang, Haojie
    Lee, Dongheui
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 993 - 999
  • [4] A survey of deep learning methods and datasets for hand pose estimation from hand-object interaction images
    Woo, Taeyun
    Park, Wonjung
    Jeong, Woohyun
    Park, Jinah
    [J]. COMPUTERS & GRAPHICS-UK, 2023, 116 : 474 - 490
  • [5] DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation
    Wang, Rong
    Mao, Wei
    Li, Hongdong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Deep Gated Multi-modal Learning: In-hand Object Pose Changes Estimation using Tactile and Image Data
    Anzai, Tomoki
    Takahashi, Iyuki
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9361 - 9368
  • [7] Graph-Based Hand-Object Meshes and Poses Reconstruction With Multi-Modal Input
    Almadani, Murad
    Elhayek, Ahmed
    Malik, Jameel
    Stricker, Didier
    [J]. IEEE ACCESS, 2021, 9 : 136438 - 136447
  • [8] Graph-Based Hand-Object Meshes and Poses Reconstruction with Multi-Modal Input
    Almadani, Murad
    Elhayek, Ahmed
    Malik, Jameel
    Stricker, Didier
    [J]. IEEE Access, 2021, 9 : 136438 - 136447
  • [9] Deep Fusion for Multi-Modal 6D Pose Estimation
    Lin, Shifeng
    Wang, Zunran
    Zhang, Shenghao
    Ling, Yonggen
    Yang, Chenguang
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2023, : 1 - 10
  • [10] Multi-Modal Sensor Fusion for Indoor Mobile Robot Pose Estimation
    Dobrev, Yassen
    Flores, Sergio
    Vossiek, Martin
    [J]. PROCEEDINGS OF THE 2016 IEEE/ION POSITION, LOCATION AND NAVIGATION SYMPOSIUM (PLANS), 2016, : 553 - 556