Cross-modal discriminant adversarial network

被引:16
|
作者
Hu, Peng [1 ,2 ]
Peng, Xi [1 ]
Zhu, Hongyuan [2 ]
Lin, Jie [2 ]
Zhen, Liangli [3 ]
Wang, Wei [1 ]
Peng, Dezhong [1 ,4 ,5 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore
[3] Agcy Sci Technol & Res, Inst High Performance Comp, Singapore, Singapore
[4] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[5] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
基金
中国国家自然科学基金;
关键词
Adversarial learning; Cross-modal representation learning; Cross-modal retrieval; Discriminant adversarial network; Cross-modal discriminant mechanism; Latent common space;
D O I
10.1016/j.patcog.2020.107734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval aims at retrieving relevant points across different modalities, such as retrieving images via texts. One key challenge of cross-modal retrieval is narrowing the heterogeneous gap across diverse modalities. To overcome this challenge, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN). Taking bi-modal data as a showcase, CAN consists of two parallel modality-specific generators, two modality-specific discriminators, and a Cross-modal Discriminant Mechanism (CDM). To be specific, the generators project diverse modalities into a latent cross-modal discriminant space. Meanwhile, the discriminators compete against the generators to alleviate the heterogeneous discrepancy in this space, i.e., the generators try to generate unified features to confuse the discriminators, and the discriminators aim to classify the generated results. To further remove the redundancy and preserve the discrimination, we propose CDM to project the generated results into a single common space, accompanying with a novel eigenvalue-based loss. Thanks to the eigenvalue-based loss, CDM could push as much discriminative power as possible into all latent directions. To demonstrate the effectiveness of our CAN, comprehensive experiments are conducted on four multimedia datasets comparing with 15 state-of-the-art approaches. (C) 2020 Elsevier Ltd. All rights reserved.Y
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Semantic Disentanglement Adversarial Hashing for Cross-Modal Retrieval
    Meng, Min
    Sun, Jiaxuan
    Liu, Jigang
    Yu, Jun
    Wu, Jigang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1914 - 1926
  • [42] UNSUPERVISED CROSS-MODAL RETRIEVAL THROUGH ADVERSARIAL LEARNING
    He, Li
    Xu, Xing
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1153 - 1158
  • [43] Deep adversarial metric learning for cross-modal retrieval
    Xu, Xing
    He, Li
    Lu, Huimin
    Gao, Lianli
    Ji, Yanli
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 657 - 672
  • [44] Adversarial Attack on Deep Cross-Modal Hamming Retrieval
    Li, Chao
    Gao, Shangqian
    Deng, Cheng
    Liu, Wei
    Huang, Heng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2198 - 2207
  • [45] Adversarial Learning for Cross-Modal Retrieval with Wasserstein Distance
    Cheng, Qingrong
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 16 - 29
  • [46] DA-GAN: Dual Attention Generative Adversarial Network for Cross-Modal Retrieval
    Cai, Liewu
    Zhu, Lei
    Zhang, Hongyan
    Zhu, Xinghui
    FUTURE INTERNET, 2022, 14 (02)
  • [47] Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network
    Xu, Xing
    Tian, Jialin
    Lin, Kaiyi
    Lu, Huimin
    Shao, Jie
    Shen, Heng Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [48] R2GAN: Cross-modal Recipe Retrieval with Generative Adversarial Network
    Zhu, Bin
    Ngo, Chong-Wah
    Chen, Jingjing
    Hao, Yanbin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11469 - 11478
  • [49] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Shichao Jiao
    Xie Han
    Fengguang Xiong
    Xiaowen Yang
    Huiyan Han
    Ligang He
    Liqun Kuang
    Neural Computing and Applications, 2022, 34 : 13469 - 13483
  • [50] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Jiao, Shichao
    Han, Xie
    Xiong, Fengguang
    Yang, Xiaowen
    Han, Huiyan
    He, Ligang
    Kuang, Liqun
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16): : 13469 - 13483