Wasserstein Coupled Graph Learning for Cross-Modal Retrieval

被引:10
|
作者
Wang, Yun [1 ]
Zhang, Tong [1 ]
Zhang, Xueya [1 ]
Cui, Zhen [1 ]
Huang, Yuge [2 ]
Shen, Pengcheng [2 ]
Li, Shaoxin [2 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab,Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing, Peoples R China
[2] Tencent, Youtu Lab, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
IMAGE;
D O I
10.1109/ICCV48922.2021.00183
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of crossmodal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.
引用
收藏
页码:1793 / 1802
页数:10
相关论文
共 50 条
  • [1] Adversarial Learning for Cross-Modal Retrieval with Wasserstein Distance
    Cheng, Qingrong
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 16 - 29
  • [2] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [3] Graph Embedding Learning for Cross-Modal Information Retrieval
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
  • [4] Combination subspace graph learning for cross-modal retrieval
    Xu, Gongwen
    Li, Xiaomei
    Shi, Lin
    Zhang, Zhijun
    Zhai, Aidong
    ALEXANDRIA ENGINEERING JOURNAL, 2020, 59 (03) : 1333 - 1343
  • [5] A Graph Model for Cross-modal Retrieval
    Wang, Shixun
    Pan, Peng
    Lu, Yansheng
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
  • [6] COUPLED DICTIONARY LEARNING AND FEATURE MAPPING FOR CROSS-MODAL RETRIEVAL
    Xu, Xing
    Shimada, Atsushi
    Taniguchi, Rin-ichiro
    He, Li
    2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,
  • [7] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [8] Cross-Modal Retrieval with Improved Graph Convolution
    Hongtu, Zhang
    Chunjian, Hua
    Yi, Jiang
    Jianfeng, Yu
    Ying, Chen
    Computer Engineering and Applications, 2024, 60 (11) : 95 - 104
  • [9] Learning to rank with relational graph and pointwise constraint for cross-modal retrieval
    Xu, Qingzhen
    Li, Miao
    Yu, Mengjing
    SOFT COMPUTING, 2019, 23 (19) : 9413 - 9427
  • [10] Cross-Modal Retrieval with Heterogeneous Graph Embedding
    Chen, Dapeng
    Wang, Min
    Chen, Haobin
    Wu, Lin
    Qin, Jing
    Peng, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300