Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval

被引:16
|
作者
Liu, Yang [1 ,3 ]
Chen, Qingchao [2 ,4 ]
Albanie, Samuel [3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing, Peoples R China
[3] Univ Oxford, Visual Geometry Grp, Oxford, England
[4] Univ Oxford, Dept Engn Sci, Oxford, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR46437.2021.01471
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the task of visual-text retrieval in the highly practical setting in which labelled visual data with paired text descriptions are available in one domain (the "source"), but only unlabelled visual data (without text descriptions) are available in the domain of interest (the "target"). We propose the ADAPTIVE CROSS-MODAL PROTOTYPES framework which seeks to enable target domain retrieval by learning cross-modal visual-text representations while minimising both uni-modal and cross-modal distribution shift across the source and target domains. Our approach is built upon two key ideas: first, we encode the inductive bias that the learned cross-modal representations should be compositional with respect to concepts in each modality-this is achieved through clustering pretrained uni-modal features across each domain and designing a careful regularisation scheme to preserve the resulting structure. Second, we employ mutual information maximisation between cross-modal representations in the source and target domains during learning-this provides a mechanism that preserves commonalities between the domains while discarding signal in each that cannot be inferred from the other. We showcase our approach for the task of cross-domain visual-text retrieval, outperforming existing approaches for both images and videos.
引用
收藏
页码:14949 / 14959
页数:11
相关论文
共 50 条
  • [41] Cross-modal Retrieval with Label Completion
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shen, Heng Tao
    He, Li
    Song, Jingkuan
    [J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 302 - 306
  • [42] FedCMR: Federated Cross-Modal Retrieval
    Zong, Linlin
    Xie, Qiujie
    Zhou, Jiahui
    Wu, Peiran
    Zhang, Xianchao
    Xu, Bo
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1672 - 1676
  • [43] DIME: An Online Tool for the Visual Comparison of Cross-modal Retrieval Models
    Zhao, Tony
    Choi, Jaeyoung
    Friedland, Gerald
    [J]. MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 729 - 733
  • [44] Cross-Modal Retrieval for Knowledge-Based Visual Question Answering
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 421 - 438
  • [45] DOMAIN UNCERTAINTY BASED ON INFORMATION THEORY FOR CROSS-MODAL HASH RETRIEVAL
    Chen, Wei
    Pu, Nan
    Liu, Yu
    Bakker, Erwin M.
    Lew, Michael S.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 43 - 48
  • [46] Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
    Shukor, Mustafa
    Thome, Nicolas
    Cord, Matthieu
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
  • [47] Variational Autoencoder with CCA for Audio-Visual Cross-modal Retrieval
    Zhang, Jiwei
    Yu, Yi
    Tang, Suhua
    Wu, Jianming
    Li, Wei
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [48] FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
    Gao, Dehong
    Jin, Linbo
    Chen, Ben
    Qiu, Minghui
    Li, Peng
    Wei, Yi
    Hu, Yi
    Wang, Hao
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2251 - 2260
  • [49] Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval
    Yu, En
    Sun, Jiande
    Li, Jing
    Chang, Xiaojun
    Han, Xian-Hua
    Hauptmann, Alexander G.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) : 1276 - 1288
  • [50] Modality-specific Adaptive Scaling Method for Cross-modal Retrieval
    Chen, Baitao
    Ke, Xiao
    [J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 202 - 205