Articulated 3D Human-Object Interactions From RGB Videos: An Empirical Analysis of Approaches and Challenges

被引:0
|
作者
Haresh, Sanjay [1 ]
Sun, Xiaohao [1 ]
Jiang, Hanxiao [1 ]
Chang, Angel X. [1 ]
Savva, Manolis [1 ]
机构
[1] Simon Fraser Univ, Burnaby, BC, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/3DV57658.2022.00043
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-object interactions with articulated objects are common in everyday life. Despite much progress in single-view 3D reconstruction, it is still challenging to infer an articulated 3D object model from an RGB video showing a person manipulating the object. We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video, and carry out a systematic benchmark of five families of methods for this task: 3D plane estimation, 3D cuboid estimation, CAD model fitting, implicit field fitting, and free-form mesh fitting. Our experiments show that all methods struggle to obtain high accuracy results even when provided ground truth information about the observed objects. We identify key factors which make the task challenging and suggest directions for future work on this challenging 3D computer vision task.
引用
收藏
页码:312 / 321
页数:10
相关论文
共 50 条
  • [1] Estimating 3D Motion and Forces of Human-Object Interactions from Internet Videos
    Li, Zongmian
    Sedlar, Jiri
    Carpentier, Justin
    Laptev, Ivan
    Mansard, Nicolas
    Sivic, Josef
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (02) : 363 - 383
  • [2] Estimating 3D Motion and Forces of Human–Object Interactions from Internet Videos
    Zongmian Li
    Jiri Sedlar
    Justin Carpentier
    Ivan Laptev
    Nicolas Mansard
    Josef Sivic
    [J]. International Journal of Computer Vision, 2022, 130 : 363 - 383
  • [3] InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion
    Xu, Sirui
    Li, Zhengyuan
    Wang, Yu-Xiong
    Gui, Liang-Yan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14882 - 14894
  • [4] Learning a Generative Model for Multi-Step Human-Object Interactions from Videos
    Wang, He
    Pirk, Soren
    Yumer, Ersin
    Kim, Vladimir G.
    Sener, Ozan
    Sridhar, Srinath
    Guibas, Leonidas J.
    [J]. COMPUTER GRAPHICS FORUM, 2019, 38 (02) : 367 - 378
  • [5] Pose Guided Feature Learning for 3D Object Tracking on RGB Videos
    Majcher, Mateusz
    Kwolek, Bogdan
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 574 - 581
  • [6] Structure from Action: Learning Interactions for 3D Articulated Object Structure Discovery
    Nie, Neil
    Gadre, Samir Yitzhak
    Ehsani, Kiana
    Song, Shuran
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1222 - 1229
  • [7] Learning human activities and object affordances from RGB-D videos
    Koppula, Hema Swetha
    Gupta, Rudhir
    Saxena, Ashutosh
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (08): : 951 - 970
  • [8] A new Bayesian modeling for 3D human-object action recognition
    Maurice, Camille
    Madrigal, Francisco
    Monin, Andre
    Lerasle, Frederic
    [J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [9] Gravity-Aware Monocular 3D Human-Object Reconstruction
    Dabral, Rishabh
    Shimada, Soshi
    Jain, Arjun
    Theobalt, Christian
    Golyanik, Vladislav
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12345 - 12354
  • [10] The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain
    Ragusa, Francesco
    Furnari, Antonino
    Livatino, Salvatore
    Farinella, Giovanni Maria
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1568 - 1577