Predicting Human-Object Interactions in Egocentric Videos

被引:2
|
作者
Benavent-Lledo, Manuel [1 ]
Oprea, Sergiu [1 ]
Alejandro Castro-Vargas, John [1 ]
Mulero-Perez, David [1 ]
Garcia-Rodriguez, Jose [1 ]
机构
[1] Univ Alicante, Dept Comp Technol, Alicante, Spain
关键词
YOLO; egocentric; action estimation; hand-object interaction; object recognition;
D O I
10.1109/IJCNN55064.2022.9892910
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos provide a rich source of hand-object interactions that support action recognition. However, prior to action recognition, one may need to detect the presence of hands and objects in the scene. In this work, we propose an action estimation architecture based on the simultaneous detection of the hands and objects in the scene. For the hand and object detection, we have adapted well known YOLO architecture, leveraging its inference speed and accuracy. We experimentally determined the best performing architecture for our task. After obtaining the hand and object bounding boxes, we select the most likely objects to interact with, i.e., the closest objects to a hand. The rough estimation of the closest objects to a hand is a direct approach to determine hand-object interaction. After identifying the scene and alongside a set of per-object and global actions, we could determine the most suitable action we are performing in each context.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain
    Ragusa, Francesco
    Furnari, Antonino
    Livatino, Salvatore
    Farinella, Giovanni Maria
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1568 - 1577
  • [2] Explicit Modeling of Human-Object Interactions in Realistic Videos
    Prest, Alessandro
    Ferrari, Vittorio
    Schmid, Cordelia
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (04) : 835 - 848
  • [3] Skew-Robust Human-Object Interactions in Videos
    Agarwal, Apoorva
    Dabral, Rishabh
    Jain, Arjun
    Ramakrishnan, Ganesh
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5087 - 5096
  • [4] Predicting the Location of "interactees" in Novel Human-Object Interactions
    Chen, Chao-Yeh
    Grauman, Kristen
    [J]. COMPUTER VISION - ACCV 2014, PT I, 2015, 9003 : 351 - 367
  • [5] Detecting Human-Object Relationships in Videos
    Ji, Jingwei
    Desai, Rishi
    Niebles, Juan Carlos
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8086 - 8096
  • [6] Spatio-Temporal Human-Object Interactions for Action Recognition in Videos
    Escorcia, Victor
    Carlos Niebles, Juan
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 508 - 514
  • [7] Detecting human-object interactions in videos by modeling the trajectory of objects and human skeleton
    Li, Qiyue
    Xie, Xuemei
    Zhang, Chen
    Zhang, Jin
    Shi, Guangming
    [J]. NEUROCOMPUTING, 2022, 509 : 234 - 243
  • [8] Feature-based Egocentric Grasp Pose Classification for Expanding Human-Object Interactions
    Besari, Adnan Rachmat Anom
    Saputra, Azhar Aulia
    Chin, Wei Hong
    Kubota, Naoyuki
    Kurnianingsih
    [J]. PROCEEDINGS OF 2021 IEEE 30TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2021,
  • [9] Egocentric Human-Object Interaction Detection Exploiting Synthetic Data
    Leonardi, Rosario
    Ragusa, Francesco
    Furnari, Antonino
    Farinella, Giovanni Maria
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 237 - 248
  • [10] Learning a Generative Model for Multi-Step Human-Object Interactions from Videos
    Wang, He
    Pirk, Soren
    Yumer, Ersin
    Kim, Vladimir G.
    Sener, Ozan
    Sridhar, Srinath
    Guibas, Leonidas J.
    [J]. COMPUTER GRAPHICS FORUM, 2019, 38 (02) : 367 - 378