AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval

被引：8

作者：

Zhu, Hongguang ^{[1
]}

Wei, Yunchao ^{[1
]}

Zhao, Yao ^{[1
]}

Zhang, Chunjie ^{[2
,3
]}

Huang, Shujuan ^{[2
,3
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[3] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2023年 / 19卷 / 06期

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

guided image retrieval; multimodal fusion; mixture-of-experts;

D O I：

10.1145/3584703

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-guided image retrieval integrates reference image and text feedback as a multimodal query to search the image corresponding to user intention. Recent approaches employ multi-level matching, multiple accesses, or multiple subnetworks for better performance regardless of the heavy burden of storage and computation in the deployment. Additionally, these models not only rely on expert knowledge to handcraft image-text composing modules but also do inference by the static computational graph. It limits the representation capability and generalization ability of networks in the face of challenges from complex and varied combinations of reference image and text feedback. To break the shackles of the static network concept, we introduce the dynamic router mechanism to achieve data-dependent expert activation and flexible collaboration of multiple experts to explore more implicit multimodal fusion patterns. Specifically, we construct AMC, our Adaptive Multi-expert Collaborative network, by using the proposed router to activate the different experts with different levels of image-text interaction. Since routers can dynamically adjust the activation of experts for the current samples, AMC can achieve the adaptive fusion mode for the different reference image and text combinations and generate dynamic computational graphs according to varied multimodal queries. Extensive experiments on two benchmark datasets demonstrate that due to benefits from the image-text composing representation produced by an adaptive multi-expert collaboration mechanism, AMC has better retrieval performance and zero-shot generalization ability than the state-of-the-art method while keeping the lightweight model and fast retrieval speed. Moreover, we analyze the visualization of path activation, attention map, and retrieval results to further understand the routing decisions and semantic localization ability of AMC. The codes and pretrained models are available at https://github.com/KevinLight831/AMC.

引用

页数：22

共 50 条

[1] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
Liu, An-An
Yang, Bo
Li, Wenhui
Song, Dan
Sun, Zhengya
Ren, Tongwei
Wei, Zhiqiang
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[2] Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval
Liu, Delong
Li, Haiwen
Zhao, Zhicheng
Dong, Yuan
NEURAL NETWORKS, 2025, 184
[3] Text-Guided Image Inpainting
Zhang, Zijian
Zhao, Zhou
Zhang, Zhu
Huai, Baoxing
Yuan, Jing
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4079 - 4087
[4] AdOCTeRA: Adaptive Optimization Constraints for improved Text-guided Retrieval of Apartments
Abdari, Ali
Falcon, Alex
Serra, Giuseppe
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1043 - 1050
[5] Text-guided visual representation learning for medical image retrieval systems
Serieys, Guillaume
Kurtz, Camille
Fournier, Laure
Cloppet, Florence
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 593 - 598
[6] Dilated Residual Aggregation Network for Text-Guided Image Manipulation
Lu, Siwei
Luo, Di
Yang, Zhenguo
Hao, Tianyong
Li, Qing
Liu, Wenyin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 28 - 40
[7] Text-Guided Generative Adversarial Network for Image Emotion Transfer
Zhu, Siqi
Qing, Chunmei
Xu, Xiangmin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 506 - 522
[8] Text-Guided Neural Image Inpainting
Zhang, Lisai
Chen, Qingcai
Hu, Baotian
Jiang, Shuoran
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1302 - 1310
[9] Text-Guided Portrait Image Matting
Xu Y.
Yao X.
Liu B.
Quan Y.
Ji H.
IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 13
[10] A TEXT-GUIDED GRAPH STRUCTURE FOR IMAGE CAPTIONING
Wang, Depeng
Hu, Zhenzhen
Zhou, Yuanen
Liu, Xueliang
Wu, Le
Hong, Richang
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,

← 1 2 3 4 5 →