Predicting Human-Object Interactions in Egocentric Videos

被引:2
|
作者
Benavent-Lledo, Manuel [1 ]
Oprea, Sergiu [1 ]
Alejandro Castro-Vargas, John [1 ]
Mulero-Perez, David [1 ]
Garcia-Rodriguez, Jose [1 ]
机构
[1] Univ Alicante, Dept Comp Technol, Alicante, Spain
关键词
YOLO; egocentric; action estimation; hand-object interaction; object recognition;
D O I
10.1109/IJCNN55064.2022.9892910
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos provide a rich source of hand-object interactions that support action recognition. However, prior to action recognition, one may need to detect the presence of hands and objects in the scene. In this work, we propose an action estimation architecture based on the simultaneous detection of the hands and objects in the scene. For the hand and object detection, we have adapted well known YOLO architecture, leveraging its inference speed and accuracy. We experimentally determined the best performing architecture for our task. After obtaining the hand and object bounding boxes, we select the most likely objects to interact with, i.e., the closest objects to a hand. The rough estimation of the closest objects to a hand is a direct approach to determine hand-object interaction. After identifying the scene and alongside a set of per-object and global actions, we could determine the most suitable action we are performing in each context.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] NeuralHOFusion: Neural Volumetric Rendering under Human-object Interactions
    Jiang, Yuheng
    Jiang, Suyi
    Sun, Guoxing
    Su, Zhuo
    Guo, Kaiwen
    Wu, Minye
    Yu, Jingyi
    Xu, Lan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6145 - 6155
  • [42] Learning Human-Object Interactions by Graph Parsing Neural Networks
    Qi, Siyuan
    Wang, Wenguan
    Jia, Baoxiong
    Shen, Jianbing
    Zhu, Song-Chun
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 407 - 423
  • [43] Graph-based method for human-object interactions detection
    Xia, Li-min
    Wu, Wei
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2021, 28 (01) : 205 - 218
  • [44] Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
    Pi, Huaijin
    Peng, Sida
    Yang, Minghui
    Zhou, Xiaowei
    Bao, Hujun
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15015 - 15027
  • [45] Action Anticipation Using Pairwise Human-Object Interactions and Transformers
    Roy, Debaditya
    Fernando, Basura
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8116 - 8129
  • [46] Enhancing Recognition of Human-Object Interaction from Visual Data Using Egocentric Wearable Camera
    Hamid, Danish
    Ul Haq, Muhammad Ehatisham
    Yasin, Amanullah
    Murtaza, Fiza
    Azam, Muhammad Awais
    [J]. FUTURE INTERNET, 2024, 16 (08)
  • [47] Spatial Audio for Human-Object Interactions in Small AR Workspaces
    Yang, Jing
    Soros, Gabor
    [J]. MOBISYS'18: PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS, AND SERVICES, 2018, : 518 - 518
  • [48] Visualizing Thermal Traces to Reveal Histories of Human-Object Interactions
    Amemiya, Tomohiro
    [J]. UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION, PT II, PROCEEDINGS: INTELLIGENT AND UBIQUITOUS INTERACTION ENVIRONMENTS, 2009, 5615 : 477 - 482
  • [49] Geometric Features Informed Multi-person Human-Object Interaction Recognition in Videos
    Qiao, Tanqiu
    Men, Qianhui
    Li, Frederick W. B.
    Kubotani, Yoshiki
    Morishima, Shigeo
    Hubert, P. H. Shum
    [J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 474 - 491
  • [50] Jointly Recognizing Object Fluents and Tasks in Egocentric Videos
    Liu, Yang
    Wei, Ping
    Zhu, Song-Chun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2943 - 2951