Wasserstein Coupled Graph Learning for Cross-Modal Retrieval

被引：10

作者：

Wang, Yun ^{[1
]}

Zhang, Tong ^{[1
]}

Zhang, Xueya ^{[1
]}

Cui, Zhen ^{[1
]}

Huang, Yuge ^{[2
]}

Shen, Pengcheng ^{[2
]}

Li, Shaoxin ^{[2
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab,Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing, Peoples R China

[2] Tencent, Youtu Lab, Guangzhou, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

IMAGE;

D O I：

10.1109/ICCV48922.2021.00183

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of crossmodal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.

引用

页码：1793 / 1802

页数：10

共 50 条

[1] Adversarial Learning for Cross-Modal Retrieval with Wasserstein Distance
Cheng, Qingrong
Zhang, Youcai
Gu, Xiaodong
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 16 - 29
[2] Multimodal Graph Learning for Cross-Modal Retrieval
Xie, Jingyou
Zhao, Zishuo
Lin, Zhenzhou
Shen, Ying
PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
[3] Graph Embedding Learning for Cross-Modal Information Retrieval
Zhang, Youcai
Gu, Xiaodong
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
[4] Combination subspace graph learning for cross-modal retrieval
Xu, Gongwen
Li, Xiaomei
Shi, Lin
Zhang, Zhijun
Zhai, Aidong
ALEXANDRIA ENGINEERING JOURNAL, 2020, 59 (03) : 1333 - 1343
[5] A Graph Model for Cross-modal Retrieval
Wang, Shixun
Pan, Peng
Lu, Yansheng
PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
[6] COUPLED DICTIONARY LEARNING AND FEATURE MAPPING FOR CROSS-MODAL RETRIEVAL
Xu, Xing
Shimada, Atsushi
Taniguchi, Rin-ichiro
He, Li
2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,
[7] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
Zhang, Chengyuan
Song, Jiayu
Zhu, Xiaofeng
Zhu, Lei
Zhang, Shichao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
[8] Cross-Modal Retrieval with Improved Graph Convolution
Hongtu, Zhang
Chunjian, Hua
Yi, Jiang
Jianfeng, Yu
Ying, Chen
Computer Engineering and Applications, 2024, 60 (11) : 95 - 104
[9] Learning to rank with relational graph and pointwise constraint for cross-modal retrieval
Xu, Qingzhen
Li, Miao
Yu, Mengjing
SOFT COMPUTING, 2019, 23 (19) : 9413 - 9427
[10] Cross-Modal Retrieval with Heterogeneous Graph Embedding
Chen, Dapeng
Wang, Min
Chen, Haobin
Wu, Lin
Qin, Jing
Peng, Wei
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300

← 1 2 3 4 5 →