HOTR: End-to-End Human-Object Interaction Detection with Transformers

被引:164
|
作者
Kim, Bumsoo [1 ,2 ]
Lee, Junhyun [2 ]
Kang, Jaewoo [2 ]
Kim, Eun-Sol [1 ]
Kim, Hyunwoo J. [2 ]
机构
[1] Kakao Brain, Seongnam, South Korea
[2] Korea Univ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR46437.2021.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels. Most existing methods have indirectly addressed this task by detecting human and object instances and individually inferring every pair of the detected instances. In this paper, we present a novel framework, referred by HOTR, which directly predicts a set of < human, object, interaction > triplets from an image based on a transformer encoder-decoder architecture. Through the set prediction, our method effectively exploits the inherent semantic relationships in an image and does not require time-consuming post-processing which is the main bottleneck of existing methods. Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
引用
收藏
页码:74 / 83
页数:10
相关论文
共 50 条
  • [11] End-to-End Object Detection with YOLOF
    Xi, Xing
    Huang, Yangyang
    Wu, Weiye
    Luo, Ronghua
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 101 - 112
  • [12] End-to-End Referring Video Object Segmentation with Multimodal Transformers
    Botach, Adam
    Zheltonozhskii, Evgenii
    Baskin, Chaim
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4975 - 4985
  • [13] VPDETR: End-to-End Vanishing Point DEtection TRansformers
    Chen, Taiyan
    Ying, Xianghua
    Yang, Jinfa
    Wang, Ruibin
    Guo, Ruohao
    Xing, Bowei
    Shi, Ji
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1192 - 1200
  • [14] L-DETR: A Light-Weight Detector for End-to-End Object Detection With Transformers
    Li, Tianyang
    Wang, Jian
    Zhang, Tibing
    IEEE ACCESS, 2022, 10 : 105685 - 105692
  • [15] Enhanced Sparse Detection for End-to-End Object Detection
    Liao, Yongwei
    Chen, Gang
    Xu, Runnan
    IEEE ACCESS, 2022, 10 : 85630 - 85640
  • [16] End-to-End Human Pose and Mesh Reconstruction with Transformers
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1954 - 1963
  • [17] EOOD: End-to-end oriented object detection
    Zhang, Caiguang
    Chen, Zilong
    Xiong, Boli
    Ji, Kefeng
    Kuang, Gangyao
    NEUROCOMPUTING, 2025, 621
  • [18] Intrinsic Explainability for End-to-End Object Detection
    Fernandes, Luis
    Fernandes, Joao N. D.
    Calado, Mariana
    Pinto, Joao Ribeiro
    Cerqueira, Ricardo
    Cardoso, Jaime S.
    IEEE ACCESS, 2024, 12 : 2623 - 2634
  • [19] What Makes for End-to-End Object Detection?
    Sun, Peize
    Jiang, Yi
    Xie, Enze
    Shao, Wenqi
    Yuan, Zehuan
    Wang, Changhu
    Luo, Ping
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [20] VRDFormer: End-to-End Video Visual Relation Detection with Transformers
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18814 - 18824