End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation

被引:0
|
作者
Wu, Mingrui [1 ,2 ]
Gu, Jiaxin [3 ]
Shen, Yunhang [2 ]
Lin, Mingbao [2 ]
Chen, Chao [2 ]
Sun, Xiaoshuai [1 ,4 ,5 ]
机构
[1] Xiamen Univ, Sch Informat, MAC Lab, Xiamen, Peoples R China
[2] Tencent, Youtu Lab, Shenzhen, Peoples R China
[3] VIS Baidu Inc, Beijing, Peoples R China
[4] Xiamen Univ, Inst Artificial Intelligence, Xiamen, Peoples R China
[5] Xiamen Univ, Fujian Engn Res Ctr Trusted Artificial Intelligen, Xiamen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing Human-Object Interaction (HOI) Detection methods rely heavily on full annotations with predefined HOI categories, which is limited in diversity and costly to scale further. We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously. The fundamental challenges are to discover potential human-object pairs and identify novel HOI categories. To overcome the above challenges, we propose a novel End-to-end zero-shot HOI Detection (EoID) framework via vision-language knowledge distillation. We first design an Interactive Score module combined with a Two-stage Bipartite Matching algorithm to achieve interaction distinguishment for human-object pairs in an action-agnostic manner. Then we transfer the distribution of action probability from the pretrained vision-language teacher as well as the seen ground truth to the HOI model to attain zero-shot HOI classification. Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs. Finally, our EoID outperforms the previous SOTAs under various zero-shot settings. Moreover, our method is generalizable to large-scale object detection data to further scale up the action sets. The source code is available at: https://github.com/mrwu-mac/EoID.
引用
收藏
页码:2839 / 2846
页数:8
相关论文
共 50 条
  • [31] Zero-shot Object Detection Through Vision-Language Embedding Alignment
    Xie, Johnathan
    Zheng, Shuai
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 926 - 940
  • [32] Zero-Shot Cross-Lingual Knowledge Transfer in VQA via Multimodal Distillation
    Weng, Yu
    Dong, Jun
    He, Wenbin
    Chaomurilige
    Liu, Xuan
    Liu, Zheng
    Gao, Honghao
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 11
  • [33] Zero-Shot Visual Sentiment Prediction via Cross-Domain Knowledge Distillation
    Moroto, Yuya
    Ye, Yingrui
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 177 - 185
  • [34] Exploring Conditional Multi-modal Prompts for Zero-Shot HOI Detection
    Lei, Ting
    Yin, Shaofeng
    Peng, Yuxin
    Liu, Yang
    COMPUTER VISION-ECCV 2024, PT LXXXII, 2025, 15140 : 1 - 19
  • [35] Knowledge Distillation Classifier Generation Network for Zero-Shot Learning
    Yu, Yunlong
    Li, Bin
    Ji, Zhong
    Han, Jungong
    Zhang, Zhongfei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (06) : 3183 - 3194
  • [36] Enhancing Zero-Shot Stance Detection via Targeted Background Knowledge
    Zhu, Qinglin
    Liang, Bin
    Sun, Jingyi
    Du, Jiachen
    Zhou, Lanjun
    Xu, Ruifeng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2070 - 2075
  • [37] Knowledge Distillation on Joint Task End-to-End Speech Translation
    Nayem, Khandokar Md
    Xue, Ran
    Chang, Ching-Yun
    Shanbhogue, Akshaya Vishnu Kudlu
    INTERSPEECH 2023, 2023, : 1493 - 1497
  • [38] Boosting End-to-end Multi-Object Tracking and Person Search via Knowledge Distillation
    Zhang, Wei
    He, Lingxiao
    Cheng, Peng
    Liao, Xingyu
    Liu, Wu
    Li, Qi
    Sun, Zhenan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1192 - 1201
  • [39] Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding
    Cappellazzo, Umberto
    Yang, Muqiao
    Falavigna, Daniele
    Brutti, Alessio
    INTERSPEECH 2023, 2023, : 2953 - 2957
  • [40] Zero-shot test time adaptation via knowledge distillation for personalized speech denoising and dereverberation
    Kim, Sunwoo
    Athi, Mrudula
    Shi, Guangji
    Kim, Minje
    Kristjansson, Trausti
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 155 (02): : 1353 - 1367