Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection

被引:13
|
作者
Liang, Zhijun [1 ]
Liu, Junfa [1 ]
Guan, Yisheng [1 ]
Rojas, Juan [2 ]
机构
[1] Guangdong Univ Technol, Sch Electromech Engn, Biomimet & Intelligent Robot Lab BIRL, Guangzhou 510006, Peoples R China
[2] Chinese Univ Hong Kong, Dept Mech & Automat Engn, Hong Kong, Peoples R China
关键词
D O I
10.1109/ROBIO54168.2021.9739429
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In scene understanding, robots benefit from not only detecting individual scene instances but also from learning their possible interactions. Human-Object Interaction (HOI) Detection infers the action predicate on a <human, predicate, object> triplet. Contextual information has been found critical in inferring interactions. However, most works only use local features from single human-object pairs for inference. Few works have studied the disambiguating contribution of subsidiary relations made available via graph networks. Similarly, few have leveraged visual cues with the intrinsic semantic regularities embedded in HOIs. We contribute Visual-Semantic Graph Attention Networks (VS-GATs): a dual-graph attention network that effectively aggregates visual, spatial, and semantic contextual information dynamically from primary human-object relations as well as subsidiary relations through attention mechanisms for strong disambiguating power. We achieve competitive results on two benchmarks: V-COCO and HICO-DET. The code is available at https://github.com/ birlrobotics/vs- gats.
引用
收藏
页码:1441 / 1447
页数:7
相关论文
共 50 条
  • [31] Lifelong Learning for Human-Object Interaction Detection
    Sun, Bo
    Lu, Sixu
    He, Jun
    Yu, Lejun
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 582 - 587
  • [32] Image Captioning With Visual-Semantic Double Attention
    He, Chen
    Hu, Haifeng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [33] ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection
    Liu, Ye
    Yuan, Junsong
    Chen, Chang Wen
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4235 - 4243
  • [34] Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object Interaction detection
    Li, Hongsheng
    Zhu, Guangming
    Zhen, Wu
    Ni, Lan
    Shen, Peiyi
    Zhang, Liang
    Wang, Ning
    Hua, Cong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [35] Learning Human-Object Interaction via Interactive Semantic Reasoning
    Yang, Dongming
    Zou, Yuexian
    Li, Zhu
    Li, Ge
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9294 - 9305
  • [36] Learning Human-Object Interactions by Graph Parsing Neural Networks
    Qi, Siyuan
    Wang, Wenguan
    Jia, Baoxiong
    Shen, Jianbing
    Zhu, Song-Chun
    COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 407 - 423
  • [37] Enhanced Transformer Interaction Components for Human-Object Interaction Detection
    Zhang, JinHui
    Zhao, Yuxiao
    Zhang, Xian
    Wang, Xiang
    Zhao, Yuxuan
    Wang, Peng
    Hu, Jian
    ACM SYMPOSIUM ON SPATIAL USER INTERACTION, SUI 2023, 2023,
  • [38] Learning Human-Object Interaction Detection using Interaction Points
    Wang, Tiancai
    Yang, Tong
    Danelljan, Martin
    Khan, Fahad Shahbaz
    Zhang, Xiangyu
    Sun, Jian
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4115 - 4124
  • [39] Visual-Semantic Graph Reasoning for Pedestrian Attribute Recognition
    Li, Qiaozhe
    Zhao, Xin
    He, Ran
    Huang, Kaiqi
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8634 - 8641
  • [40] Human-object interaction detection algorithm based on graph structure and improved cascade pyramid network
    Ye, Qing
    Xu, Xiuju
    Li, Rui
    Zhang, Yongmei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249