Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval

被引：16

作者：

Liu, Yang ^{[1
,3
]}

Chen, Qingchao ^{[2
,4
]}

Albanie, Samuel ^{[3
]}

机构：

[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China

[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing, Peoples R China

[3] Univ Oxford, Visual Geometry Grp, Oxford, England

[4] Univ Oxford, Dept Engn Sci, Oxford, England

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1109/CVPR46437.2021.01471

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study the task of visual-text retrieval in the highly practical setting in which labelled visual data with paired text descriptions are available in one domain (the "source"), but only unlabelled visual data (without text descriptions) are available in the domain of interest (the "target"). We propose the ADAPTIVE CROSS-MODAL PROTOTYPES framework which seeks to enable target domain retrieval by learning cross-modal visual-text representations while minimising both uni-modal and cross-modal distribution shift across the source and target domains. Our approach is built upon two key ideas: first, we encode the inductive bias that the learned cross-modal representations should be compositional with respect to concepts in each modality-this is achieved through clustering pretrained uni-modal features across each domain and designing a careful regularisation scheme to preserve the resulting structure. Second, we employ mutual information maximisation between cross-modal representations in the source and target domains during learning-this provides a mechanism that preserves commonalities between the domains while discarding signal in each that cannot be inferred from the other. We showcase our approach for the task of cross-domain visual-text retrieval, outperforming existing approaches for both images and videos.

引用

页码：14949 / 14959

页数：11

共 50 条

[41] Cross-modal Retrieval with Label Completion
Xu, Xing
Shen, Fumin
Yang, Yang
Shen, Heng Tao
He, Li
Song, Jingkuan
[J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 302 - 306
[42] FedCMR: Federated Cross-Modal Retrieval
Zong, Linlin
Xie, Qiujie
Zhou, Jiahui
Wu, Peiran
Zhang, Xianchao
Xu, Bo
[J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1672 - 1676
[43] DIME: An Online Tool for the Visual Comparison of Cross-modal Retrieval Models
Zhao, Tony
Choi, Jaeyoung
Friedland, Gerald
[J]. MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 729 - 733
[44] Cross-Modal Retrieval for Knowledge-Based Visual Question Answering
Lerner, Paul
Ferret, Olivier
Guinaudeau, Camille
[J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 421 - 438
[45] DOMAIN UNCERTAINTY BASED ON INFORMATION THEORY FOR CROSS-MODAL HASH RETRIEVAL
Chen, Wei
Pu, Nan
Liu, Yu
Bakker, Erwin M.
Lew, Michael S.
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 43 - 48
[46] Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Shukor, Mustafa
Thome, Nicolas
Cord, Matthieu
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 247
[47] Variational Autoencoder with CCA for Audio-Visual Cross-modal Retrieval
Zhang, Jiwei
Yu, Yi
Tang, Suhua
Wu, Jianming
Li, Wei
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
[48] FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
Gao, Dehong
Jin, Linbo
Chen, Ben
Qiu, Minghui
Li, Peng
Wei, Yi
Hu, Yi
Wang, Hao
[J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2251 - 2260
[49] Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval
Yu, En
Sun, Jiande
Li, Jing
Chang, Xiaojun
Han, Xian-Hua
Hauptmann, Alexander G.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (05) : 1276 - 1288
[50] Modality-specific Adaptive Scaling Method for Cross-modal Retrieval
Chen, Baitao
Ke, Xiao
[J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 202 - 205

← 1 2 3 4 5 →