Polysemy Deciphering Network for Robust Human-Object Interaction Detection

被引:30
|
作者
Zhong, Xubin [1 ]
Ding, Changxing [1 ,2 ]
Qu, Xian [1 ]
Tao, Dacheng [3 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510000, Peoples R China
[2] Pazhou Lab, Guangzhou 510330, Peoples R China
[3] JD Com, JD Explore Acad, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Human-object interaction; Verb polysemy; Language priors; Attention model;
D O I
10.1007/s11263-021-01458-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-Object Interaction (HOI) detection is important to human-centric scene understanding tasks. Existing works tend to assume that the same verb has similar visual characteristics in different HOI categories, an approach that ignores the diverse semantic meanings of the verb. To address this issue, in this paper, we propose a novel Polysemy Deciphering Network (PD-Net) that decodes the visual polysemy of verbs for HOI detection in three distinct ways. First, we refine features for HOI detection to be polysemy-aware through the use of two novel modules: namely, Language Prior-guided Channel Attention (LPCA) and Language Prior-based Feature Augmentation (LPFA). LPCA highlights important elements in human and object appearance features for each HOI category to be identified; moreover, LPFA augments human pose and spatial features for HOI detection using language priors, enabling the verb classifiers to receive language hints that reduce intra-class variation for the same verb. Second, we introduce a novel Polysemy-Aware Modal Fusion module, which guides PD-Net to make decisions based on feature types deemed more important according to the language priors. Third, we propose to relieve the verb polysemy problem through sharing verb classifiers for semantically similar HOI categories. Furthermore, to expedite research on the verb polysemy problem, we build a new benchmark dataset named HOI-VerbPolysemy (HOI-VP), which includes common verbs (predicates) that have diverse semantic meanings in the real world. Finally, through deciphering the visual polysemy of verbs, our approach is demonstrated to outperform state-of-the-art methods by significant margins on the HICO-DET, V-COCO, and HOI-VP databases. Code and data in this paper are available at .
引用
收藏
页码:1910 / 1929
页数:20
相关论文
共 50 条
  • [1] Polysemy Deciphering Network for Robust Human–Object Interaction Detection
    Xubin Zhong
    Changxing Ding
    Xian Qu
    Dacheng Tao
    International Journal of Computer Vision, 2021, 129 : 1910 - 1929
  • [2] An Improved Human-Object Interaction Detection Network
    Gao, Song
    Wang, Hongyu
    Song, Jilai
    Xu, Fang
    Zou, Fengshan
    PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 192 - 196
  • [3] Parallel disentangling network for human-object interaction detection
    Cheng, Yamin
    Duan, Hancong
    Wang, Chen
    Chen, Zhijun
    PATTERN RECOGNITION, 2024, 146
  • [4] Semantic Inference Network for Human-Object Interaction Detection
    Liu, Hongyi
    Mo, Lisha
    Ma, Huimin
    IMAGE AND GRAPHICS, ICIG 2019, PT I, 2019, 11901 : 518 - 529
  • [5] Hierarchical Reasoning Network for Human-Object Interaction Detection
    Gao, Yiming
    Kuang, Zhanghui
    Li, Guanbin
    Zhang, Wayne
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8306 - 8317
  • [6] ERNet: An Efficient and Reliable Human-Object Interaction Detection Network
    Lim, JunYi
    Baskaran, Vishnu Monn
    Lim, Joanne Mun-Yee
    Wong, KokSheik
    See, John
    Tistarelli, Massimo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 964 - 979
  • [7] Multi-stream Network for Human-object Interaction Detection
    Wang, Chang
    Sun, Jinyu
    Ma, Shiwei
    Lu, Yuqiu
    Liu, Wang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (08)
  • [8] Pose graph parsing network for human-object interaction detection
    Su, Zhan
    Wang, Yuting
    Xie, Qing
    Yu, Ruiyun
    NEUROCOMPUTING, 2022, 476 : 53 - 62
  • [9] Human-Centric Parsing Network for Human-Object Interaction Detection
    Chen, Guanyu
    Chen, Chong
    Zhao, Zhicheng
    Su, Fei
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5488 - 5494
  • [10] Relation Parsing Neural Network for Human-Object Interaction Detection
    Zhou, Penghao
    Chi, Mingmin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 843 - 851