HOTR: End-to-End Human-Object Interaction Detection with Transformers

被引:164
|
作者
Kim, Bumsoo [1 ,2 ]
Lee, Junhyun [2 ]
Kang, Jaewoo [2 ]
Kim, Eun-Sol [1 ]
Kim, Hyunwoo J. [2 ]
机构
[1] Kakao Brain, Seongnam, South Korea
[2] Korea Univ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR46437.2021.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels. Most existing methods have indirectly addressed this task by detecting human and object instances and individually inferring every pair of the detected instances. In this paper, we present a novel framework, referred by HOTR, which directly predicts a set of < human, object, interaction > triplets from an image based on a transformer encoder-decoder architecture. Through the set prediction, our method effectively exploits the inherent semantic relationships in an image and does not require time-consuming post-processing which is the main bottleneck of existing methods. Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
引用
收藏
页码:74 / 83
页数:10
相关论文
共 50 条
  • [41] End-to-end Lane Shape Prediction with Transformers
    Liu, Ruijin
    Yuan, Zejian
    Liu, Tie
    Xiong, Zhiliang
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3693 - 3701
  • [42] End-to-End Video Instance Segmentation with Transformers
    Wang, Yuqing
    Xu, Zhaoliang
    Wang, Xinlong
    Shen, Chunhua
    Cheng, Baoshan
    Shen, Hao
    Xia, Huaxia
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8737 - 8746
  • [43] Cascade Transformers for End-to-End Person Search
    Yu, Rui
    Du, Dawei
    LaLonde, Rodney
    Davila, Daniel
    Funk, Christopher
    Hoogs, Anthony
    Clipp, Brian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7257 - 7266
  • [44] Diagnosing Rarity in Human-object Interaction Detection
    Kilickaya, Mert
    Smeulders, Arnold
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3956 - 3960
  • [45] Human-Object Interaction Detection with Missing Objects
    Kogashi, Kaen
    Wu, Yang
    Nobuhara, Shohei
    Nishino, Ko
    PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
  • [46] Parallel Queries for Human-Object Interaction Detection
    Chen, Junwen
    Yanai, Keiji
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [47] Lifelong Learning for Human-Object Interaction Detection
    Sun, Bo
    Lu, Sixu
    He, Jun
    Yu, Lejun
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 582 - 587
  • [48] Few-shot human-object interaction video recognition with transformers
    Li, Qiyue
    Xie, Xuemei
    Zhang, Jin
    Shi, Guangming
    NEURAL NETWORKS, 2023, 163 : 1 - 9
  • [49] RadarFormer: End-to-End Human Perception With Through-Wall Radar and Transformers
    Zheng, Zhijie
    Zhang, Diankun
    Liang, Xiao
    Liu, Xiaojun
    Fang, Guangyou
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 15
  • [50] DHLA: Dynamic Hybrid Label Assignment for End-to-End Object Detection
    Hu, Zhiliang
    Chen, Si
    Hua, Yang
    Wang, Da-Han
    Zhu, Shunzhi
    Yan, Yan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1055 - 1069