Wasserstein Coupled Graph Learning for Cross-Modal Retrieval

被引：10

作者：

Wang, Yun ^{[1
]}

Zhang, Tong ^{[1
]}

Zhang, Xueya ^{[1
]}

Cui, Zhen ^{[1
]}

Huang, Yuge ^{[2
]}

Shen, Pengcheng ^{[2
]}

Li, Shaoxin ^{[2
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab,Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing, Peoples R China

[2] Tencent, Youtu Lab, Guangzhou, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

IMAGE;

D O I：

10.1109/ICCV48922.2021.00183

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of crossmodal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.

引用

页码：1793 / 1802

页数：10

共 50 条

[31] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval
Zeng, Yawen
Cao, Da
Wei, Xiaochi
Liu, Meng
Zhao, Zhou
Qin, Zheng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2215 - 2224
[32] Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval
Wu, Hongchang
Guan, Ziyu
Zhi, Tao
zhao, Wei
Xu, Cai
Han, Hong
Yang, Yarning
2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 265 - 272
[33] Joint-Modal Graph Convolutional Hashing for unsupervised cross-modal retrieval
Meng, Hui
Zhang, Huaxiang
Liu, Li
Liu, Dongmei
Lu, Xu
Guo, Xinru
NEUROCOMPUTING, 2024, 595
[34] Category Alignment Adversarial Learning for Cross-Modal Retrieval
He, Shiyuan
Wang, Weiyang
Wang, Zheng
Xu, Xing
Yang, Yang
Wang, Xiaoming
Shen, Heng Tao
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4527 - 4538
[35] Adversarial cross-modal retrieval based on dictionary learning
Shang, Fei
Zhang, Huaxiang
Zhu, Lei
Sun, Jiande
NEUROCOMPUTING, 2019, 355 : 93 - 104
[36] Heterogeneous Metric Learning for Cross-Modal Multimedia Retrieval
Deng, Jun
Du, Liang
Shen, Yi-Dong
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT I, 2013, 8180 : 43 - 56
[37] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
Zhen, Liangli
Hu, Peng
Peng, Xi
Goh, Rick Siow Mong
Zhou, Joey Tianyi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
[38] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Ren, Shuhuai
Lin, Junyang
Zhao, Guangxiang
Men, Rui
Yang, An
Zhou, Jingren
Sun, Xu
Yang, Hongxia
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 514 - 524
[39] Variational Deep Representation Learning for Cross-Modal Retrieval
Yang, Chen
Deng, Zongyong
Li, Tianyu
Liu, Hao
Liu, Libo
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 498 - 510
[40] Domain Invariant Subspace Learning for Cross-Modal Retrieval
Liu, Chenlu
Xu, Xing
Yang, Yang
Lu, Huimin
Shen, Fumin
Ji, Yanli
MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 94 - 105

← 1 2 3 4 5 →