Wasserstein Coupled Graph Learning for Cross-Modal Retrieval

被引:10
|
作者
Wang, Yun [1 ]
Zhang, Tong [1 ]
Zhang, Xueya [1 ]
Cui, Zhen [1 ]
Huang, Yuge [2 ]
Shen, Pengcheng [2 ]
Li, Shaoxin [2 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab,Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing, Peoples R China
[2] Tencent, Youtu Lab, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
IMAGE;
D O I
10.1109/ICCV48922.2021.00183
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graphs play an important role in cross-modal image-text understanding as they characterize the intrinsic structure which is robust and crucial for the measurement of crossmodal similarity. In this work, we propose a Wasserstein Coupled Graph Learning (WCGL) method to deal with the cross-modal retrieval task. First, graphs are constructed according to two input cross-modal samples separately, and passed through the corresponding graph encoders to extract robust features. Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning. Based on this dictionary, the input graphs can be transformed into the dictionary space to facilitate the similarity measurement through a Wasserstein Graph Embedding (WGE) process. The WGE could capture the graph correlation between the input and each corresponding key through optimal transport, and hence well characterize the inter-graph structural relationship. To further achieve discriminant graph learning, we specifically define a Wasserstein discriminant loss on the coupled graph keys to make the intra-class (counterpart) keys more compact and inter-class (non-counterpart) keys more dispersed, which further promotes the final cross-modal retrieval task. Experimental results demonstrate the effectiveness and state-of-the-art performance.
引用
收藏
页码:1793 / 1802
页数:10
相关论文
共 50 条
  • [41] Deep adversarial metric learning for cross-modal retrieval
    Xu, Xing
    He, Li
    Lu, Huimin
    Gao, Lianli
    Ji, Yanli
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 657 - 672
  • [42] UNSUPERVISED CROSS-MODAL RETRIEVAL THROUGH ADVERSARIAL LEARNING
    He, Li
    Xu, Xing
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1153 - 1158
  • [43] Online Asymmetric Similarity Learning for Cross-Modal Retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3984 - 3993
  • [44] TRAJCROSS: Trajecotry Cross-Modal Retrieval with Contrastive Learning
    Jing, Quanliang
    Yao, Di
    Gong, Chang
    Fan, Xinxin
    Wang, Baoli
    Tan, Haining
    Bi, Jingping
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 344 - 349
  • [45] Adaptive Adversarial Learning based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Wang, Zhongrui
    Gu, Guanghun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [46] Dual Subspaces with Adversarial Learning for Cross-Modal Retrieval
    Xia, Yaxian
    Wang, Wenmin
    Han, Liang
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 654 - 663
  • [47] Universal Weighting Metric Learning for Cross-Modal Retrieval
    Wei, Jiwei
    Yang, Yang
    Xu, Xing
    Zhu, Xiaofeng
    Shen, Heng Tao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6534 - 6545
  • [48] Learning discriminative common alignments for cross-modal retrieval
    Liu, Hui
    Chen, Xiao-Ping
    Hong, Rui
    Zhou, Yan
    Wan, Tian-Cai
    Bai, Tai-Li
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [49] Deep Hashing Similarity Learning for Cross-Modal Retrieval
    Ma, Ying
    Wang, Meng
    Lu, Guangyun
    Sun, Yajun
    IEEE ACCESS, 2024, 12 : 8609 - 8618
  • [50] Scalable Deep Multimodal Learning for Cross-Modal Retrieval
    Hu, Peng
    Zhen, Liangli
    Peng, Dezhong
    Liu, Pei
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 635 - 644