Towards zero-shot human-object interaction detection via vision-language integration

被引：0

作者：

Xue, Weiying ^{[1
]}

Liu, Qi ^{[1
]}

Wang, Yuxiao ^{[1
]}

Wei, Zhenao ^{[1
]}

Xing, Xiaofen ^{[1
]}

Xu, Xiangmin ^{[1
]}

机构：

[1] South China Univ Technol, Sch Future Technol, Guangzhou 511400, Guangdong, Peoples R China

来源：

NEURAL NETWORKS | 2025年 / 187卷

基金：

中国国家自然科学基金;

关键词：

Human-object interaction; Multimodal integration; Zero-shot; Weakly supervision;

D O I：

10.1016/j.neunet.2025.107348

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human-object interaction (HOI) detection aims to locate human-object pairs and identify their interaction categories in images. Most existing methods primarily focus on supervised learning, which relies on extensive manual HOI annotations. Such heavy reliance on closed-set supervised learning limits their generalization capabilities to unseen object categories. Inspired by the remarkable zero-shot capabilities of VLM, we propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of the visual-language model to improve zero-shot HOI detection. Specifically, we propose a ho-pair encoder to supplement contextual and interaction-specific semantic representation decoder into our model. Additionally, we propose two fusion strategies to facilitate prior knowledge transfer of VLM. One is visual-level fusion, producing more global context interaction features; another is language-level fusion, further enhancing the capability of VLM for HOI detection. Extensive experiments conducted on the mainstream HICO-DET and V-COCO datasets demonstrate that our model outperforms the previous methods in various zero-shot and full-supervised settings. The source code is available in https://github.com/xwyscut/K2HOI.

引用

页数：9

共 50 条

[31] ZERO-SHOT OBJECT DETECTION WITH TRANSFORMERS
Zheng, Ye
Cui, Li
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 444 - 448
[32] A Survey of Zero-Shot Object Detection
Cao, Weipeng
Yao, Xuyang
Xu, Zhiwu
Liu, Ye
Pan, Yinghui
Ming, Zhong
BIG DATA MINING AND ANALYTICS, 2025, 8 (03): : 726 - 750
[33] Zero-Shot Camouflaged Object Detection
Li, Haoran
Feng, Chun-Mei
Xu, Yong
Zhou, Tao
Yao, Lina
Chang, Xiaojun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5126 - 5137
[34] End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
Wu, Mingrui
Gu, Jiaxin
Shen, Yunhang
Lin, Mingbao
Chen, Chao
Sun, Xiaoshuai
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2839 - 2846
[35] Zero-shot Object Detection for Infrared Images Using Pre-trained Vision and Language Models
Miwa, Shotaro
Otsubo, Shun
Jia, Qu
Susumu, Yasuaki
INFRARED TECHNOLOGY AND APPLICATIONS L, 2024, 13046
[36] Vision-Language Models Performing Zero-Shot Tasks Exhibit Disparities Between Gender Groups
Hall, Melissa
Gustafson, Laura
Adcock, Aaron
Misra, Ishan
Ross, Candace
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2770 - 2777
[37] Towards Multimodal Disinformation Detection by Vision-language Knowledge Interaction
Li, Qilei
Gao, Mingliang
Zhang, Guisheng
Zhai, Wenzhe
Chen, Jinyong
Jeon, Gwanggil
INFORMATION FUSION, 2024, 102
[38] Rethinking vision transformer through human-object interaction detection
Cheng, Yamin
Zhao, Zitian
Wang, Zhi
Duan, Hancong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
[39] Towards Zero-Shot Sign Language Recognition
Bilge, Yunus Can
Cinbis, Ramazan Gokberk
Ikizler-Cinbis, Nazli
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 1217 - 1232
[40] Human-Object Interaction Detection via Disentangled Transformer
Zhou, Desen
Liu, Zhichao
Wang, Jian
Wang, Leshan
Hu, Tao
Ding, Errui
Wang, Jingdong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19546 - 19555

← 1 2 3 4 5 →