HOTR: End-to-End Human-Object Interaction Detection with Transformers

被引:164
|
作者
Kim, Bumsoo [1 ,2 ]
Lee, Junhyun [2 ]
Kang, Jaewoo [2 ]
Kim, Eun-Sol [1 ]
Kim, Hyunwoo J. [2 ]
机构
[1] Kakao Brain, Seongnam, South Korea
[2] Korea Univ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR46437.2021.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels. Most existing methods have indirectly addressed this task by detecting human and object instances and individually inferring every pair of the detected instances. In this paper, we present a novel framework, referred by HOTR, which directly predicts a set of < human, object, interaction > triplets from an image based on a transformer encoder-decoder architecture. Through the set prediction, our method effectively exploits the inherent semantic relationships in an image and does not require time-consuming post-processing which is the main bottleneck of existing methods. Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
引用
收藏
页码:74 / 83
页数:10
相关论文
共 50 条
  • [1] DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection
    Fang, Hao-Shu
    Xie, Yichen
    Shao, Dian
    Lu, Cewu
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1291 - 1299
  • [2] MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
    Kim, Bumsoo
    Mun, Jonghwan
    On, Kyoung-Woon
    Shin, Minchul
    Lee, Junhyun
    Kim, Eun-Sol
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19556 - 19565
  • [3] RR-Net: Relation Reasoning for End-to-End Human-Object Interaction Detection
    Yang, Dongming
    Zou, Yuexian
    Zhang, Can
    Cao, Meng
    Chen, Jie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3853 - 3865
  • [4] End-to-End Human Object Interaction Detection with HOI Transformer
    Zou, Cheng
    Wang, Bohan
    Hu, Yue
    Liu, Junqi
    Wu, Qian
    Zhao, Yu
    Li, Boxun
    Zhang, Chenguang
    Zhang, Chi
    Wei, Yichen
    Sun, Jian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11820 - 11829
  • [5] Deeply Tensor Compressed Transformers for End-to-End Object Detection
    Zhen, Peining
    Gao, Ziyang
    Hou, Tianshu
    Cheng, Yuan
    Chen, Hai-Bao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4716 - 4724
  • [6] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
  • [7] End-to-End Human-Gaze-Target Detection with Transformers
    Tu, Danyang
    Min, Xiongkuo
    Duan, Huiyu
    Guo, Guodong
    Zhai, Guangtao
    Shen, Wei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2192 - 2200
  • [8] TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
    Zhou, Qianyu
    Li, Xiangtai
    He, Lu
    Yang, Yibo
    Cheng, Guangliang
    Tong, Yunhai
    Ma, Lizhuang
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7853 - 7869
  • [9] Toward Compact Transformers for End-to-End Object Detection With Decomposed Chain Tensor Structure
    Zhen, Peining
    Yan, Xiaotao
    Wang, Wei
    Hou, Tianshu
    Wei, Hao
    Chen, Hai-Bao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 872 - 885
  • [10] CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection
    Cao, Xipeng
    Yuan, Peng
    Feng, Bailan
    Niu, Kun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 185 - 193