Towards zero-shot human-object interaction detection via vision-language integration

被引:0
|
作者
Xue, Weiying [1 ]
Liu, Qi [1 ]
Wang, Yuxiao [1 ]
Wei, Zhenao [1 ]
Xing, Xiaofen [1 ]
Xu, Xiangmin [1 ]
机构
[1] South China Univ Technol, Sch Future Technol, Guangzhou 511400, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Human-object interaction; Multimodal integration; Zero-shot; Weakly supervision;
D O I
10.1016/j.neunet.2025.107348
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-object interaction (HOI) detection aims to locate human-object pairs and identify their interaction categories in images. Most existing methods primarily focus on supervised learning, which relies on extensive manual HOI annotations. Such heavy reliance on closed-set supervised learning limits their generalization capabilities to unseen object categories. Inspired by the remarkable zero-shot capabilities of VLM, we propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of the visual-language model to improve zero-shot HOI detection. Specifically, we propose a ho-pair encoder to supplement contextual and interaction-specific semantic representation decoder into our model. Additionally, we propose two fusion strategies to facilitate prior knowledge transfer of VLM. One is visual-level fusion, producing more global context interaction features; another is language-level fusion, further enhancing the capability of VLM for HOI detection. Extensive experiments conducted on the mainstream HICO-DET and V-COCO datasets demonstrate that our model outperforms the previous methods in various zero-shot and full-supervised settings. The source code is available in https://github.com/xwyscut/K2HOI.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] ZERO-SHOT OBJECT DETECTION WITH TRANSFORMERS
    Zheng, Ye
    Cui, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 444 - 448
  • [32] A Survey of Zero-Shot Object Detection
    Cao, Weipeng
    Yao, Xuyang
    Xu, Zhiwu
    Liu, Ye
    Pan, Yinghui
    Ming, Zhong
    BIG DATA MINING AND ANALYTICS, 2025, 8 (03): : 726 - 750
  • [33] Zero-Shot Camouflaged Object Detection
    Li, Haoran
    Feng, Chun-Mei
    Xu, Yong
    Zhou, Tao
    Yao, Lina
    Chang, Xiaojun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5126 - 5137
  • [34] End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
    Wu, Mingrui
    Gu, Jiaxin
    Shen, Yunhang
    Lin, Mingbao
    Chen, Chao
    Sun, Xiaoshuai
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2839 - 2846
  • [35] Zero-shot Object Detection for Infrared Images Using Pre-trained Vision and Language Models
    Miwa, Shotaro
    Otsubo, Shun
    Jia, Qu
    Susumu, Yasuaki
    INFRARED TECHNOLOGY AND APPLICATIONS L, 2024, 13046
  • [36] Vision-Language Models Performing Zero-Shot Tasks Exhibit Disparities Between Gender Groups
    Hall, Melissa
    Gustafson, Laura
    Adcock, Aaron
    Misra, Ishan
    Ross, Candace
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2770 - 2777
  • [37] Towards Multimodal Disinformation Detection by Vision-language Knowledge Interaction
    Li, Qilei
    Gao, Mingliang
    Zhang, Guisheng
    Zhai, Wenzhe
    Chen, Jinyong
    Jeon, Gwanggil
    INFORMATION FUSION, 2024, 102
  • [38] Rethinking vision transformer through human-object interaction detection
    Cheng, Yamin
    Zhao, Zitian
    Wang, Zhi
    Duan, Hancong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [39] Towards Zero-Shot Sign Language Recognition
    Bilge, Yunus Can
    Cinbis, Ramazan Gokberk
    Ikizler-Cinbis, Nazli
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 1217 - 1232
  • [40] Human-Object Interaction Detection via Disentangled Transformer
    Zhou, Desen
    Liu, Zhichao
    Wang, Jian
    Wang, Leshan
    Hu, Tao
    Ding, Errui
    Wang, Jingdong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19546 - 19555