Human-Object Interaction Detection with Ratio-Transformer

被引:0
|
作者
Wang, Tianlang [1 ]
Lu, Tao [1 ]
Fang, Wenhua [1 ]
Zhang, Yanduo [1 ]
机构
[1] Wuhan Inst Technol, Hubei Key Lab Intelligent Robot, Sch Comp Sci & Engn, Wuhan 430000, Peoples R China
来源
SYMMETRY-BASEL | 2022年 / 14卷 / 08期
基金
中国国家自然科学基金;
关键词
human-object interaction; end-to-end; attention mechanism; transformer; symmetry; sampler; VCOCO;
D O I
10.3390/sym14081666
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Human-object interaction (HOI) is a human-centered object detection task that aims to identify the interactions between persons and objects in an image. Previous end-to-end methods have used the attention mechanism of a transformer to spontaneously identify the associations between persons and objects in an image, which effectively improved detection accuracy; however, a transformer can increase computational demands and slow down detection processes. In addition, the end-to-end method can result in asymmetry between foreground and background information. The foreground data may be significantly less than the background data, while the latter consumes more computational resources without significantly improving detection accuracy. Therefore, we proposed an input-controlled transformer, "ratio-transformer" to solve an HOI task, which could not only limit the amount of information in the input transformer by setting a sampling ratio, but also significantly reduced the computational demands while ensuring detection accuracy. The ratio-transformer consisted of a sampling module and a transformer network. The sampling module divided the input feature map into foreground versus background features. The irrelevant background features were a pooling sampler, which were then fused with the foreground features as input data for the transformer. As a result, the valid data input into the Transformer network remained constant, while irrelevant information was significantly reduced, which maintained the foreground and background information symmetry. The proposed network was able to learn the feature information of the target itself and the association features between persons and objects so it could query to obtain the complete HOI interaction triplet. The experiments on the VCOCO dataset showed that the proposed method reduced the computational demand of the transformer by 57% without any loss of accuracy, as compared to other current HOI methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Agglomerative Transformer for Human-Object Interaction Detection
    Tu, Danyang
    Sun, Wei
    Zhai, Guangtao
    Shen, Wei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21557 - 21567
  • [2] Enhanced Transformer Interaction Components for Human-Object Interaction Detection
    Zhang, JinHui
    Zhao, Yuxiao
    Zhang, Xian
    Wang, Xiang
    Zhao, Yuxuan
    Wang, Peng
    Hu, Jian
    [J]. ACM SYMPOSIUM ON SPATIAL USER INTERACTION, SUI 2023, 2023,
  • [3] Human-Object Interaction Detection via Disentangled Transformer
    Zhou, Desen
    Liu, Zhichao
    Wang, Jian
    Wang, Leshan
    Hu, Tao
    Ding, Errui
    Wang, Jingdong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19546 - 19555
  • [4] Rethinking vision transformer through human-object interaction detection
    Cheng, Yamin
    Zhao, Zitian
    Wang, Zhi
    Duan, Hancong
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [5] Mask-Guided Transformer for Human-Object Interaction Detection
    Ying, Daocheng
    Yang, Hua
    Sun, Jun
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [6] Learning Human-Object Interaction Detection via Deformable Transformer
    Cai, Shuang
    Ma, Shiwei
    Gu, Dongzhou
    [J]. 2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2021, 12076
  • [7] Compositional Learning in Transformer-Based Human-Object Interaction Detection
    Zhuang, Zikun
    Qian, Ruihao
    Xie, Chi
    Liang, Shuang
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1038 - 1043
  • [8] Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
    Tu, Danyang
    Min, Xiongkuo
    Duan, Huiyu
    Guo, Guodong
    Zhai, Guangtao
    Shen, Wei
    [J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 87 - 103
  • [9] Pairwise CNN-Transformer Features for Human-Object Interaction Detection
    Quan, Hutuo
    Lai, Huicheng
    Gao, Guxue
    Ma, Jun
    Li, Junkai
    Chen, Dongji
    [J]. ENTROPY, 2024, 26 (03)
  • [10] Human-object interaction detection based on disentangled axial attention transformer
    Xia, Limin
    Xiao, Qiyue
    [J]. MACHINE VISION AND APPLICATIONS, 2024, 35 (04)