Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network

被引:6
|
作者
Xing, Hao [1 ]
Burschka, Darius [1 ]
机构
[1] Tech Univ Munich, Machine Vis & Percept Grp, Dept Comp Sci, Arcisstr 21, D-80333 Munich, Germany
关键词
D O I
10.1109/IROS47612.2022.9981771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human activities recognition is an important task for an intelligent robot, especially in the field of human-robot collaboration, it requires not only the label of subactivities but also the temporal structure of the activity. In order to automatically recognize both the label and the temporal structure in sequence of human-object interaction, we propose a novel Pyramid Graph Convolutional Network (PGCN), which employs a pyramidal encoder-decoder architecture consisting of an attention based graph convolution network and a temporal pyramid pooling module for downsampling and upsampling interaction sequence on the temporal axis, respectively. The system represents the 2D or 3D spatial relation of human and objects from the detection results in video data as a graph. To learn the human-object relations, a new attention graph convolutional network is trained to extract condensed information from the graph representation. To segment action into sub-actions, a novel temporal pyramid pooling module is proposed, which upsamples compressed features back to the original time scale and classifies actions per frame. We explore various attention layers, namely spatial attention, temporal attention and channel attention, and combine different upsampling decoders to test the performance on action recognition and segmentation. We evaluate our model on two challenging datasets in the field of human-object interaction recognition, i.e. Bimanual Actions and IKEA Assembly datasets. We demonstrate that our classifier significantly improves both framewise action recognition and segmentation, e.g., F1 micro and F1@50 scores on Bimanual Actions dataset are improved by 4.3% and 8.5% respectively.
引用
收藏
页码:5195 / 5201
页数:7
相关论文
共 50 条
  • [1] Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition
    Wang, Ning
    Zhu, Guangming
    Zhang, Liang
    Shen, Peiyi
    Li, Hongsheng
    Hua, Cong
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4985 - 4993
  • [2] STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos
    Almushyti, Muna
    Li, Frederick W. B.
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3287 - 3294
  • [3] Knowledge-Based Role Recognition by Using Human-Object Interaction and Spatio-Temporal Analysis
    Yang, Chule
    Zeng, Yijie
    Yue, Yufeng
    Siritanawan, Prarinya
    Zhang, Jun
    Wang, Danwei
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE ROBIO 2017), 2017, : 159 - 164
  • [4] Exploiting Spatio-temporal Human-object Relations using Graph Neural Networks for Human Action Recognition and 3D Motion Forecasting
    Lagamtzis, Dimitrios
    Schmidt, Fabian
    Seyler, Jan
    Dang, Thao
    Schober, Steffen
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7832 - 7838
  • [5] Spatio-Temporal Human-Object Interactions for Action Recognition in Videos
    Escorcia, Victor
    Carlos Niebles, Juan
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 508 - 514
  • [6] Human-object interaction detection algorithm based on graph structure and improved cascade pyramid network
    Ye, Qing
    Xu, Xiuju
    Li, Rui
    Zhang, Yongmei
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [7] Robust Traffic Prediction Using Probabilistic Spatio-Temporal Graph Convolutional Network
    Karim, Atkia Akila
    Nower, Naushin
    [J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2024, 2024, 2141 : 259 - 273
  • [8] Probabilistic spatio-temporal graph convolutional network for traffic forecasting
    Karim, Atkia Akila
    Nower, Naushin
    [J]. APPLIED INTELLIGENCE, 2024, : 7070 - 7085
  • [9] A Spatio-Temporal Graph Convolutional Network for Air Quality Prediction
    Li, Pengfei
    Zhang, Tong
    Jin, Yantao
    [J]. SUSTAINABILITY, 2023, 15 (09)
  • [10] A Spatio-Temporal CRF for Human Interaction Understanding
    Wang, Zhenhua
    Liu, Sheng
    Zhang, Jianhua
    Chen, Shengyong
    Guan, Qiu
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (08) : 1647 - 1660