Predicting Human-Object Interactions in Egocentric Videos

被引:2
|
作者
Benavent-Lledo, Manuel [1 ]
Oprea, Sergiu [1 ]
Alejandro Castro-Vargas, John [1 ]
Mulero-Perez, David [1 ]
Garcia-Rodriguez, Jose [1 ]
机构
[1] Univ Alicante, Dept Comp Technol, Alicante, Spain
关键词
YOLO; egocentric; action estimation; hand-object interaction; object recognition;
D O I
10.1109/IJCNN55064.2022.9892910
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos provide a rich source of hand-object interactions that support action recognition. However, prior to action recognition, one may need to detect the presence of hands and objects in the scene. In this work, we propose an action estimation architecture based on the simultaneous detection of the hands and objects in the scene. For the hand and object detection, we have adapted well known YOLO architecture, leveraging its inference speed and accuracy. We experimentally determined the best performing architecture for our task. After obtaining the hand and object bounding boxes, we select the most likely objects to interact with, i.e., the closest objects to a hand. The rough estimation of the closest objects to a hand is a direct approach to determine hand-object interaction. After identifying the scene and alongside a set of per-object and global actions, we could determine the most suitable action we are performing in each context.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Human-Object Interactions Are More than the Sum of Their Parts
    Baldassano, Christopher
    Beck, Diane M.
    Fei-Fei, Li
    [J]. CEREBRAL CORTEX, 2017, 27 (03) : 2276 - 2288
  • [32] Spatially Conditioned Graphs for Detecting Human-Object Interactions
    Zhang, Frederic Z.
    Campbell, Dylan
    Gould, Stephen
    [J]. Proceedings of the IEEE International Conference on Computer Vision, 2021, : 13299 - 13307
  • [33] Spatially Conditioned Graphs for Detecting Human-Object Interactions
    Zhang, Frederic Z.
    Campbell, Dylan
    Gould, Stephen
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13299 - 13307
  • [34] Exploiting multimodal synthetic data for egocentric human-object interaction detection in an industrial scenario
    Leonardi, Rosario
    Ragusa, Francesco
    Furnari, Antonino
    Farinella, Giovanni Maria
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 242
  • [35] Action Prediction Based on Physically Grounded Object Affordances in Human-Object Interactions
    Dutta, Vibekananda
    Zielinska, Teresa
    [J]. 2017 11TH INTERNATIONAL WORKSHOP ON ROBOT MOTION AND CONTROL (ROMOCO), 2017, : 41 - 46
  • [36] Novel Anomalous Event Detection based on Human-object Interactions
    Colque, Rensso Mora
    Caetano, Carlos
    de Melo, Victor C.
    Chavez, Guillermo Camara
    Schwartz, William Robson
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2018), VOL 5: VISAPP, 2018, : 293 - 300
  • [37] Exploring Predicate Visual Context in Detecting of Human-Object Interactions
    Zhang, Frederic Z.
    Yuan, Yuhui
    Campbell, Dylan
    Zhong, Zhuoyao
    Gould, Stephen
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10377 - 10387
  • [38] Modeling 4D Human-Object Interactions for Event and Object Recognition
    Wei, Ping
    Zhao, Yibiao
    Zheng, Nanning
    Zhu, Song-Chun
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3272 - 3279
  • [39] Recognizing Human-Object Interactions Using Sparse Subspace Clustering
    Bogun, Ivan
    Ribeiro, Eraldo
    [J]. COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 409 - 416
  • [40] Causality Inspired Retrieval of Human-object Interactions from Video
    Zhou, Liting
    Liu, Jianquan
    Nishimura, Shoji
    Antony, Joseph
    Gurrin, Cathal
    [J]. 2019 INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2019,