Human-Object Interaction Detection with Ratio-Transformer

被引：0

作者：

Wang, Tianlang ^{[1
]}

Lu, Tao ^{[1
]}

Fang, Wenhua ^{[1
]}

Zhang, Yanduo ^{[1
]}

机构：

[1] Wuhan Inst Technol, Hubei Key Lab Intelligent Robot, Sch Comp Sci & Engn, Wuhan 430000, Peoples R China

来源：

SYMMETRY-BASEL | 2022年 / 14卷 / 08期

基金：

中国国家自然科学基金;

关键词：

human-object interaction; end-to-end; attention mechanism; transformer; symmetry; sampler; VCOCO;

D O I：

10.3390/sym14081666

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Human-object interaction (HOI) is a human-centered object detection task that aims to identify the interactions between persons and objects in an image. Previous end-to-end methods have used the attention mechanism of a transformer to spontaneously identify the associations between persons and objects in an image, which effectively improved detection accuracy; however, a transformer can increase computational demands and slow down detection processes. In addition, the end-to-end method can result in asymmetry between foreground and background information. The foreground data may be significantly less than the background data, while the latter consumes more computational resources without significantly improving detection accuracy. Therefore, we proposed an input-controlled transformer, "ratio-transformer" to solve an HOI task, which could not only limit the amount of information in the input transformer by setting a sampling ratio, but also significantly reduced the computational demands while ensuring detection accuracy. The ratio-transformer consisted of a sampling module and a transformer network. The sampling module divided the input feature map into foreground versus background features. The irrelevant background features were a pooling sampler, which were then fused with the foreground features as input data for the transformer. As a result, the valid data input into the Transformer network remained constant, while irrelevant information was significantly reduced, which maintained the foreground and background information symmetry. The proposed network was able to learn the feature information of the target itself and the association features between persons and objects so it could query to obtain the complete HOI interaction triplet. The experiments on the VCOCO dataset showed that the proposed method reduced the computational demand of the transformer by 57% without any loss of accuracy, as compared to other current HOI methods.

引用

页数：10

共 50 条

[1] Agglomerative Transformer for Human-Object Interaction Detection
Tu, Danyang
Sun, Wei
Zhai, Guangtao
Shen, Wei
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21557 - 21567
[2] Enhanced Transformer Interaction Components for Human-Object Interaction Detection
Zhang, JinHui
Zhao, Yuxiao
Zhang, Xian
Wang, Xiang
Zhao, Yuxuan
Wang, Peng
Hu, Jian
[J]. ACM SYMPOSIUM ON SPATIAL USER INTERACTION, SUI 2023, 2023,
[3] Human-Object Interaction Detection via Disentangled Transformer
Zhou, Desen
Liu, Zhichao
Wang, Jian
Wang, Leshan
Hu, Tao
Ding, Errui
Wang, Jingdong
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19546 - 19555
[4] Rethinking vision transformer through human-object interaction detection
Cheng, Yamin
Zhao, Zitian
Wang, Zhi
Duan, Hancong
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
[5] Mask-Guided Transformer for Human-Object Interaction Detection
Ying, Daocheng
Yang, Hua
Sun, Jun
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
[6] Learning Human-Object Interaction Detection via Deformable Transformer
Cai, Shuang
Ma, Shiwei
Gu, Dongzhou
[J]. 2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2021, 12076
[7] Compositional Learning in Transformer-Based Human-Object Interaction Detection
Zhuang, Zikun
Qian, Ruihao
Xie, Chi
Liang, Shuang
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1038 - 1043
[8] Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
Tu, Danyang
Min, Xiongkuo
Duan, Huiyu
Guo, Guodong
Zhai, Guangtao
Shen, Wei
[J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 87 - 103
[9] Pairwise CNN-Transformer Features for Human-Object Interaction Detection
Quan, Hutuo
Lai, Huicheng
Gao, Guxue
Ma, Jun
Li, Junkai
Chen, Dongji
[J]. ENTROPY, 2024, 26 (03)
[10] Human-object interaction detection based on disentangled axial attention transformer
Xia, Limin
Xiao, Qiyue
[J]. MACHINE VISION AND APPLICATIONS, 2024, 35 (04)

← 1 2 3 4 5 →