Wasserstein Coupled Graph Learning for Cross-Modal Retrieval

被引:10
|
作者
Wang, Yun [1 ]
Zhang, Tong [1 ]
Zhang, Xueya [1 ]
Cui, Zhen [1 ]
Huang, Yuge [2 ]
Shen, Pengcheng [2 ]
Li, Shaoxin [2 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab,Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing, Peoples R China
[2] Tencent, Youtu Lab, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
IMAGE;
D O I
10.1109/ICCV48922.2021.00183
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of crossmodal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.
引用
收藏
页码:1793 / 1802
页数:10
相关论文
共 50 条
  • [21] Cross-Modal Retrieval Using Deep Learning
    Malik, Shaily
    Bhardwaj, Nikhil
    Bhardwaj, Rahul
    Kumar, Saurabh
    PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
  • [22] Learning Cross-Modal Retrieval with Noisy Labels
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Zhen, Liangli
    Lin, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
  • [23] Hybrid representation learning for cross-modal retrieval
    Cao, Wenming
    Lin, Qiubin
    He, Zhihai
    He, Zhiquan
    NEUROCOMPUTING, 2019, 345 : 45 - 57
  • [24] Federated learning for supervised cross-modal retrieval
    Li, Ang
    Li, Yawen
    Shao, Yingxia
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (04):
  • [25] Cross-modal Metric Learning with Graph Embedding
    Zhang, Youcai
    Gu, Xiaodong
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 758 - 764
  • [26] Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval
    Cheng, Qingrong
    Gu, Xiaodong
    NEURAL NETWORKS, 2021, 134 : 143 - 162
  • [27] Modality-Fused Graph Network for Cross-Modal Retrieval
    Wu, Fei
    LI, Shuaishuai
    Peng, Guangchuan
    Ma, Yongheng
    Jing, Xiao-Yuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 1094 - 1097
  • [28] Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval
    Bai, Cong
    Zeng, Chao
    Ma, Qing
    Zhang, Jinglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4756 - 4767
  • [29] Iterative graph attention memory network for cross-modal retrieval
    Dong, Xinfeng
    Zhang, Huaxiang
    Dong, Xiao
    Lu, Xu
    KNOWLEDGE-BASED SYSTEMS, 2021, 226
  • [30] Exploring Graph-Structured Semantics for Cross-Modal Retrieval
    Zhang, Lei
    Chen, Leiting
    Zhou, Chuan
    Yang, Fan
    Li, Xin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4277 - 4286