Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization

被引:44
|
作者
Wei, Ping [1 ,2 ]
Zhao, Yibiao [3 ]
Zheng, Nanning [1 ]
Zhu, Song-Chun [3 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
[2] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
关键词
Human-object interaction; object affordance; event recognition; sequence segmentation; object localization; AFFORDANCES; GEOMETRY;
D O I
10.1109/TPAMI.2016.2574712
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization. The 4DHOI model represents the geometric, temporal, and semantic relations in daily events involving human-object interactions. In 3D space, the interactions of human poses and contextual objects are modeled by semantic co-occurrence and geometric compatibility. On the time axis, the interactions are represented as a sequence of atomic event transitions with coherent objects. The 4DHOI model is a hierarchical spatial-temporal graph representation which can be used for inferring scene functionality and object affordance. The graph structures and parameters are learned using an ordered expectation maximization algorithm which mines the spatial-temporal structures of events from RGB-D video samples. Given an input RGB-D video, the inference is performed by a dynamic programming beam search algorithm which simultaneously carries out event segmentation, recognition, and object localization. We collected a large multiview RGB-D event dataset which contains 3,815 video sequences and 383,036 RGB-D frames captured by three RGB-D cameras. The experimental results on three challenging datasets demonstrate the strength of the proposed method.
引用
收藏
页码:1165 / 1179
页数:15
相关论文
共 50 条
  • [1] Modeling 4D Human-Object Interactions for Event and Object Recognition
    Wei, Ping
    Zhao, Yibiao
    Zheng, Nanning
    Zhu, Song-Chun
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3272 - 3279
  • [2] Recognizing Human-Object Interactions via Target Localization
    Cho, Sunyoung
    Park, Jihun
    Shin, Young Sook
    Lee, Sang-ho
    [J]. 2018 18TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2018, : 836 - 840
  • [3] Exemplar-Based Recognition of Human-Object Interactions
    Hu, Jian-Fang
    Zheng, Wei-Shi
    Lai, Jianhuang
    Gong, Shaogang
    Xiang, Tao
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (04) : 647 - 660
  • [4] Human-Object Interaction Recognition Based on Modeling Context
    Shuyang Li
    Wei Liang
    Qun Zhang
    [J]. Journal of Beijing Institute of Technology, 2017, 26 (02) : 215 - 222
  • [5] Explicit Modeling of Human-Object Interactions in Realistic Videos
    Prest, Alessandro
    Ferrari, Vittorio
    Schmid, Cordelia
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (04) : 835 - 848
  • [6] Human-Object Interaction Recognition Based on Modeling Context
    Li, Shuyang
    Liang, Wei
    Zhang, Qun
    [J]. Journal of Beijing Institute of Technology (English Edition), 2017, 26 (02): : 215 - 222
  • [7] Novel Anomalous Event Detection based on Human-object Interactions
    Colque, Rensso Mora
    Caetano, Carlos
    de Melo, Victor C.
    Chavez, Guillermo Camara
    Schwartz, William Robson
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2018), VOL 5: VISAPP, 2018, : 293 - 300
  • [8] A new Bayesian modeling for 3D human-object action recognition
    Maurice, Camille
    Madrigal, Francisco
    Monin, Andre
    Lerasle, Frederic
    [J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [9] Cascaded Human-Object Interaction Recognition
    Zhou, Tianfei
    Wang, Wenguan
    Qi, Siyuan
    Ling, Haibin
    Shen, Jianbing
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4262 - 4271
  • [10] Detecting and Recognizing Human-Object Interactions
    Gkioxari, Georgia
    Girshick, Ross
    Dollar, Piotr
    He, Kaiming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8359 - 8367