AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval

被引:8
|
作者
Zhu, Hongguang [1 ]
Wei, Yunchao [1 ]
Zhao, Yao [1 ]
Zhang, Chunjie [2 ,3 ]
Huang, Shujuan [2 ,3 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[3] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
guided image retrieval; multimodal fusion; mixture-of-experts;
D O I
10.1145/3584703
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-guided image retrieval integrates reference image and text feedback as a multimodal query to search the image corresponding to user intention. Recent approaches employ multi-level matching, multiple accesses, or multiple subnetworks for better performance regardless of the heavy burden of storage and computation in the deployment. Additionally, these models not only rely on expert knowledge to handcraft image-text composing modules but also do inference by the static computational graph. It limits the representation capability and generalization ability of networks in the face of challenges from complex and varied combinations of reference image and text feedback. To break the shackles of the static network concept, we introduce the dynamic router mechanism to achieve data-dependent expert activation and flexible collaboration of multiple experts to explore more implicit multimodal fusion patterns. Specifically, we construct AMC, our Adaptive Multi-expert Collaborative network, by using the proposed router to activate the different experts with different levels of image-text interaction. Since routers can dynamically adjust the activation of experts for the current samples, AMC can achieve the adaptive fusion mode for the different reference image and text combinations and generate dynamic computational graphs according to varied multimodal queries. Extensive experiments on two benchmark datasets demonstrate that due to benefits from the image-text composing representation produced by an adaptive multi-expert collaboration mechanism, AMC has better retrieval performance and zero-shot generalization ability than the state-of-the-art method while keeping the lightweight model and fast retrieval speed. Moreover, we analyze the visualization of path activation, attention map, and retrieval results to further understand the routing decisions and semantic localization ability of AMC. The codes and pretrained models are available at https://github.com/KevinLight831/AMC.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
    Liu, An-An
    Yang, Bo
    Li, Wenhui
    Song, Dan
    Sun, Zhengya
    Ren, Tongwei
    Wei, Zhiqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [2] Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval
    Liu, Delong
    Li, Haiwen
    Zhao, Zhicheng
    Dong, Yuan
    NEURAL NETWORKS, 2025, 184
  • [3] Text-Guided Image Inpainting
    Zhang, Zijian
    Zhao, Zhou
    Zhang, Zhu
    Huai, Baoxing
    Yuan, Jing
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4079 - 4087
  • [4] AdOCTeRA: Adaptive Optimization Constraints for improved Text-guided Retrieval of Apartments
    Abdari, Ali
    Falcon, Alex
    Serra, Giuseppe
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1043 - 1050
  • [5] Text-guided visual representation learning for medical image retrieval systems
    Serieys, Guillaume
    Kurtz, Camille
    Fournier, Laure
    Cloppet, Florence
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 593 - 598
  • [6] Dilated Residual Aggregation Network for Text-Guided Image Manipulation
    Lu, Siwei
    Luo, Di
    Yang, Zhenguo
    Hao, Tianyong
    Li, Qing
    Liu, Wenyin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 28 - 40
  • [7] Text-Guided Generative Adversarial Network for Image Emotion Transfer
    Zhu, Siqi
    Qing, Chunmei
    Xu, Xiangmin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 506 - 522
  • [8] Text-Guided Neural Image Inpainting
    Zhang, Lisai
    Chen, Qingcai
    Hu, Baotian
    Jiang, Shuoran
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1302 - 1310
  • [9] Text-Guided Portrait Image Matting
    Xu Y.
    Yao X.
    Liu B.
    Quan Y.
    Ji H.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 13
  • [10] A TEXT-GUIDED GRAPH STRUCTURE FOR IMAGE CAPTIONING
    Wang, Depeng
    Hu, Zhenzhen
    Zhou, Yuanen
    Liu, Xueliang
    Wu, Le
    Hong, Richang
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,