Explicit Modeling of Human-Object Interactions in Realistic Videos

被引:68
|
作者
Prest, Alessandro [1 ,2 ]
Ferrari, Vittorio [3 ]
Schmid, Cordelia [4 ]
机构
[1] ETH, Comp Vis Lab, CH-8092 Zurich, Switzerland
[2] INRIA, LEAR Team, Grenoble, France
[3] Univ Edinburgh, IPAB Inst, CALVIN, Edinburgh EH8 9LE, Midlothian, Scotland
[4] INRIA Rhone Alpes, LEAR Team, F-38334 Saint Ismier, France
基金
瑞士国家科学基金会;
关键词
Action recognition; human-object interaction; video analysis;
D O I
10.1109/TPAMI.2012.175
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous work typically represents actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object w.r.t. to the person position. Our approach relies on state-of-the-art techniques for human detection [32], object detection [10], and tracking [39]. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture the relative trajectory of the object w.r.t. the human. Experimental results on the Coffee and Cigarettes dataset [25], the video dataset of [19], and the Rochester Daily Activities dataset [29] show that 1) our explicit human-object model is an informative cue for action recognition; 2) it is complementary to traditional low-level descriptors such as 3D-HOG [23] extracted over human tracks. We show that combining our human-object interaction features with 3D-HOG improves compared to their individual performance as well as over the state of the art [23], [29].
引用
收藏
页码:835 / 848
页数:14
相关论文
共 50 条
  • [1] Predicting Human-Object Interactions in Egocentric Videos
    Benavent-Lledo, Manuel
    Oprea, Sergiu
    Alejandro Castro-Vargas, John
    Mulero-Perez, David
    Garcia-Rodriguez, Jose
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [2] Detecting human-object interactions in videos by modeling the trajectory of objects and human skeleton
    Li, Qiyue
    Xie, Xuemei
    Zhang, Chen
    Zhang, Jin
    Shi, Guangming
    [J]. NEUROCOMPUTING, 2022, 509 : 234 - 243
  • [3] Skew-Robust Human-Object Interactions in Videos
    Agarwal, Apoorva
    Dabral, Rishabh
    Jain, Arjun
    Ramakrishnan, Ganesh
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5087 - 5096
  • [4] Detecting Human-Object Relationships in Videos
    Ji, Jingwei
    Desai, Rishi
    Niebles, Juan Carlos
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8086 - 8096
  • [5] Spatio-Temporal Human-Object Interactions for Action Recognition in Videos
    Escorcia, Victor
    Carlos Niebles, Juan
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 508 - 514
  • [6] Modeling 4D Human-Object Interactions for Event and Object Recognition
    Wei, Ping
    Zhao, Yibiao
    Zheng, Nanning
    Zhu, Song-Chun
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3272 - 3279
  • [7] Learning a Generative Model for Multi-Step Human-Object Interactions from Videos
    Wang, He
    Pirk, Soren
    Yumer, Ersin
    Kim, Vladimir G.
    Sener, Ozan
    Sridhar, Srinath
    Guibas, Leonidas J.
    [J]. COMPUTER GRAPHICS FORUM, 2019, 38 (02) : 367 - 378
  • [8] Learning Asynchronous and Sparse Human-Object Interaction in Videos
    Morais, Romero
    Vuong Le
    Venkatesh, Svetha
    Truyen Tran
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16036 - 16045
  • [9] Learning to Detect Human-Object Interactions
    Chao, Yu-Wei
    Liu, Yunfan
    Liu, Xieyang
    Zeng, Huayi
    Deng, Jia
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 381 - 389
  • [10] Detecting and Recognizing Human-Object Interactions
    Gkioxari, Georgia
    Girshick, Ross
    Dollar, Piotr
    He, Kaiming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8359 - 8367