Explicit Modeling of Human-Object Interactions in Realistic Videos

被引:68
|
作者
Prest, Alessandro [1 ,2 ]
Ferrari, Vittorio [3 ]
Schmid, Cordelia [4 ]
机构
[1] ETH, Comp Vis Lab, CH-8092 Zurich, Switzerland
[2] INRIA, LEAR Team, Grenoble, France
[3] Univ Edinburgh, IPAB Inst, CALVIN, Edinburgh EH8 9LE, Midlothian, Scotland
[4] INRIA Rhone Alpes, LEAR Team, F-38334 Saint Ismier, France
基金
瑞士国家科学基金会;
关键词
Action recognition; human-object interaction; video analysis;
D O I
10.1109/TPAMI.2012.175
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous work typically represents actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object w.r.t. to the person position. Our approach relies on state-of-the-art techniques for human detection [32], object detection [10], and tracking [39]. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture the relative trajectory of the object w.r.t. the human. Experimental results on the Coffee and Cigarettes dataset [25], the video dataset of [19], and the Rochester Daily Activities dataset [29] show that 1) our explicit human-object model is an informative cue for action recognition; 2) it is complementary to traditional low-level descriptors such as 3D-HOG [23] extracted over human tracks. We show that combining our human-object interaction features with 3D-HOG improves compared to their individual performance as well as over the state of the art [23], [29].
引用
收藏
页码:835 / 848
页数:14
相关论文
共 50 条
  • [41] Causality Inspired Retrieval of Human-object Interactions from Video
    Zhou, Liting
    Liu, Jianquan
    Nishimura, Shoji
    Antony, Joseph
    Gurrin, Cathal
    [J]. 2019 INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2019,
  • [42] Recognizing Human-Object Interactions Using Sparse Subspace Clustering
    Bogun, Ivan
    Ribeiro, Eraldo
    [J]. COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 409 - 416
  • [43] Graph-based method for human-object interactions detection
    Xia, Li-min
    Wu, Wei
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2021, 28 (01) : 205 - 218
  • [44] Action Anticipation Using Pairwise Human-Object Interactions and Transformers
    Roy, Debaditya
    Fernando, Basura
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8116 - 8129
  • [45] Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
    Pi, Huaijin
    Peng, Sida
    Yang, Minghui
    Zhou, Xiaowei
    Bao, Hujun
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15015 - 15027
  • [46] Visualizing Thermal Traces to Reveal Histories of Human-Object Interactions
    Amemiya, Tomohiro
    [J]. UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION, PT II, PROCEEDINGS: INTELLIGENT AND UBIQUITOUS INTERACTION ENVIRONMENTS, 2009, 5615 : 477 - 482
  • [47] Spatial Audio for Human-Object Interactions in Small AR Workspaces
    Yang, Jing
    Soros, Gabor
    [J]. MOBISYS'18: PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS, AND SERVICES, 2018, : 518 - 518
  • [48] Geometric Features Informed Multi-person Human-Object Interaction Recognition in Videos
    Qiao, Tanqiu
    Men, Qianhui
    Li, Frederick W. B.
    Kubotani, Yoshiki
    Morishima, Shigeo
    Hubert, P. H. Shum
    [J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 474 - 491
  • [49] Reasoning About Human-Object Interactions Through Dual Attention Networks
    Xiao, Tete
    Fan, Quanfu
    Gutfreund, Dan
    Monfort, Mathew
    Oliva, Aude
    Zhou, Bolei
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3918 - 3927
  • [50] STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos
    Almushyti, Muna
    Li, Frederick W. B.
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3287 - 3294