Parallel Queries for Human-Object Interaction Detection

被引:1
|
作者
Chen, Junwen [1 ]
Yanai, Keiji [1 ]
机构
[1] Univ Elect Commun, Tokyo, Japan
关键词
human-object interaction detection; object detection; transformer;
D O I
10.1145/3551626.3564944
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Human-Object Interaction (HOI) Detection requires localizing a pair of humans and objects. Recent transformer-based methods leverage the query embeddings to represent the entire HOI instances. The target embeddings after decoding are used to represent the object and human characteristics at the same time. However, it is ambiguous to use the highly integrated embeddings to localize the human and object simultaneously. To address this problem, we split the detection decoding process into subject decoding and object decoding to detect the humans and objects in parallel. Our proposed method, Parallel Query Network (PQNet) uses two transformer decoders to decode the subject embeddings and object embeddings in parallel, and a novel verb decoder is used to fuse the representation from the detection decoding and predict the interaction. The attention mechanisms in the verb decoder consist of the attention between human and object embeddings and the attention between the fused embeddings and global semantic features. As the transformer architecture maintains the permutation of the input query embeddings, the paired boxes of humans and objects are directly predicted by feed-forward networks. With the full usage of the object detection part, our proposed architecture outperforms the state-of-the-art baseline method with half of the training epochs.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Parallel disentangling network for human-object interaction detection
    Cheng, Yamin
    Duan, Hancong
    Wang, Chen
    Chen, Zhijun
    [J]. PATTERN RECOGNITION, 2024, 146
  • [2] A Survey of Human-Object Interaction Detection
    Gong, Xun
    Zhang, Zhiying
    Liu, Lu
    Ma, Bing
    Wu, Kunlun
    [J]. Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2022, 57 (04): : 693 - 704
  • [3] Distillation Using Oracle Queries for Transformer-based Human-Object Interaction Detection
    Qu, Xian
    Ding, Changxing
    Li, Xingao
    Zhong, Xubin
    Tao, Dacheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19536 - 19545
  • [4] An Improved Human-Object Interaction Detection Network
    Gao, Song
    Wang, Hongyu
    Song, Jilai
    Xu, Fang
    Zou, Fengshan
    [J]. PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 192 - 196
  • [5] Human-object interaction detection with missing objects
    Kogashi, Kaen
    Wu, Yang
    Nobuhara, Shohei
    Nishino, Ko
    [J]. IMAGE AND VISION COMPUTING, 2021, 113
  • [6] Distance Matters in Human-Object Interaction Detection
    Wang, Guangzhi
    Guo, Yangyang
    Wong, Yongkang
    Kankanhalli, Mohan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4546 - 4554
  • [7] Agglomerative Transformer for Human-Object Interaction Detection
    Tu, Danyang
    Sun, Wei
    Zhai, Guangtao
    Shen, Wei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21557 - 21567
  • [8] Diagnosing Rarity in Human-object Interaction Detection
    Kilickaya, Mert
    Smeulders, Arnold
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3956 - 3960
  • [9] Human-Object Interaction Detection with Missing Objects
    Kogashi, Kaen
    Wu, Yang
    Nobuhara, Shohei
    Nishino, Ko
    [J]. PROCEEDINGS OF 17TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA 2021), 2021,
  • [10] Lifelong Learning for Human-Object Interaction Detection
    Sun, Bo
    Lu, Sixu
    He, Jun
    Yu, Lejun
    [J]. 2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 582 - 587