GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes

被引:78
|
作者
Corona, Enric [1 ,2 ]
Pumarola, Albert [1 ]
Alenya, Guillem [1 ]
Moreno-Noguer, Francesc [1 ]
Rogez, Gregory [2 ]
机构
[1] CSIC UPC, Inst Robot & Informat Ind, Barcelona, Spain
[2] NAVER LABS Europe, Meylan, France
关键词
D O I
10.1109/CVPR42600.2020.00508
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of deep learning has brought remarkable progress in estimating hand geometry from images where the hands are part of the scene. This paper focuses on a new problem not explored so far, consisting in predicting how a human would grasp one or several objects, given a single RGB image of these objects. This is a problem with enormous potential in e.g. augmented reality, robotics or prosthetic design. In order to predict feasible grasps, we need to understand the semantic content of the image, its geometric structure and all potential interactions with a hand physical model. To this end, we introduce a generative model that jointly reasons in all these levels and 1) regresses the 3D shape and pose of the objects in the scene; 2) estimates the grasp types; and 3) refines the 51-DoF of a 3D hand model that minimize a graspability loss. To train this model we build the YCB-Affordance dataset, that contains more than 133k images of 21 objects in the YCB-Video dataset [69]. We have annotated these images with more than 28M plausible 3D human grasps according to a 33-class taxonomy. A thorough evaluation in synthetic and real images shows that our model can robustly predict realistic grasps, even in cluttered scenes with multiple objects in close contact.
引用
下载
收藏
页码:5030 / 5040
页数:11
相关论文
共 50 条
  • [21] Efficient Multi-object Detection for Complexity Spatio-Temporal Scenes
    Wang, Kai
    Song, Xiangyu
    Sun, Shijie
    Zhao, Juan
    Xu, Cai
    Song, Huansheng
    WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 186 - 200
  • [22] Multi-object tracking in dynamic scenes by integrating statistical and cognitive approaches
    Pathan, S.S., 1600, International Journal of Computer Science Issues (IJCSI) (09): : 4 - 3
  • [23] Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes
    Yin, Mingjun
    Li, Shasha
    Cai, Zikui
    Song, Chengyu
    Asif, M. Salman
    Roy-Chowdhury, Amit K.
    Krishnamurthy, Srikanth, V
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7838 - 7847
  • [24] SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes
    Cui, Yutao
    Zeng, Chenkai
    Zhao, Xiaoyu
    Yang, Yichun
    Wu, Gangshan
    Wang, Limin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9887 - 9897
  • [25] Cellular automaton as a fast tool for animation of liquid in multi-object scenes
    Wcislo, R
    Kitowski, J
    Moscinski, J
    WSCG '98, VOL 3: SIXTH INTERNATIONAL CONFERENCE IN CENTRAL EUROPE ON COMPUTER GRAPHICS AND VISUALIZATION 98, 1998, : 417 - 423
  • [26] Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes
    Wojek, Christian
    Walk, Stefan
    Roth, Stefan
    Schindler, Konrad
    Schiele, Bernt
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (04) : 882 - 897
  • [27] Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes
    Dang, Min
    Liu, Gang
    Li, Hao
    Xu, Qijie
    Wang, Xu
    Pan, Rong
    APPLIED INTELLIGENCE, 2024, 54 (06) : 4935 - 4951
  • [28] Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns
    Karazija, Laurynas
    Choudhury, Subhabrata
    Laina, Iro
    Rupprecht, Christian
    Vedaldi, Andrea
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [29] Orientation- and scale-invariant recognition of textures in multi-object scenes
    Teuner, A
    Pichler, O
    Conde, JES
    Hosticka, BJ
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL III, 1997, : 174 - 177
  • [30] COST: An approach for camera selection and multi-object inference ordering in dynamic scenes
    Gupta, Abhinav
    Mittal, Anurag
    Davis, Larry S.
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 118 - +