Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval

被引:16
|
作者
Liu, Yang [1 ,3 ]
Chen, Qingchao [2 ,4 ]
Albanie, Samuel [3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing, Peoples R China
[3] Univ Oxford, Visual Geometry Grp, Oxford, England
[4] Univ Oxford, Dept Engn Sci, Oxford, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR46437.2021.01471
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the task of visual-text retrieval in the highly practical setting in which labelled visual data with paired text descriptions are available in one domain (the "source"), but only unlabelled visual data (without text descriptions) are available in the domain of interest (the "target"). We propose the ADAPTIVE CROSS-MODAL PROTOTYPES framework which seeks to enable target domain retrieval by learning cross-modal visual-text representations while minimising both uni-modal and cross-modal distribution shift across the source and target domains. Our approach is built upon two key ideas: first, we encode the inductive bias that the learned cross-modal representations should be compositional with respect to concepts in each modality-this is achieved through clustering pretrained uni-modal features across each domain and designing a careful regularisation scheme to preserve the resulting structure. Second, we employ mutual information maximisation between cross-modal representations in the source and target domains during learning-this provides a mechanism that preserves commonalities between the domains while discarding signal in each that cannot be inferred from the other. We showcase our approach for the task of cross-domain visual-text retrieval, outperforming existing approaches for both images and videos.
引用
收藏
页码:14949 / 14959
页数:11
相关论文
共 50 条
  • [1] Cross-domain Cross-modal Food Transfer
    Zhu, Bin
    Ngo, Chong-Wah
    Chen, Jing-jing
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3762 - 3770
  • [2] Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation
    Zhao, Wentian
    Wu, Xinxiao
    Luo, Jiebo
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1180 - 1192
  • [3] Domain Adaptive Cross-Modal Image Retrieval via Modality and Domain Translations
    Yanagi, Rintaro
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2021, E104A (06) : 866 - 875
  • [4] Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation
    Chen, Yiyang
    Zhao, Shanshan
    Ding, Changxing
    Tang, Liyao
    Wang, Chaoyue
    Tao, Dacheng
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3866 - 3875
  • [5] Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
    Dong, Jianfeng
    Long, Zhongzi
    Mao, Xiaofeng
    Lin, Changting
    He, Yuan
    Ji, Shouling
    [J]. NEUROCOMPUTING, 2021, 440 : 207 - 219
  • [6] Cross-modal Target Retrieval for Tracking by Natural Language
    Li, Yihao
    Yu, Jun
    Cai, Zhongpeng
    Pan, Yuwen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4927 - 4936
  • [7] Applying an Embodied Cognition Perspective to Cross-Modal and Cross-Domain Color Associations
    Loeffler, Diana
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2016, 51 : 1135 - 1135
  • [8] Domain Invariant Subspace Learning for Cross-Modal Retrieval
    Liu, Chenlu
    Xu, Xing
    Yang, Yang
    Lu, Huimin
    Shen, Fumin
    Ji, Yanli
    [J]. MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 94 - 105
  • [9] Adaptive Adversarial Learning based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Wang, Zhongrui
    Gu, Guanghun
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [10] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)