Multimodal adversarial network for cross-modal retrieval

被引:48
|
作者
Hu, Peng [1 ]
Peng, Dezhong [1 ,2 ,3 ]
Wang, Xu [1 ]
Xiang, Yong [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Machine Intelligence Lab, Chengdu 610065, Sichuan, Peoples R China
[2] Chengdu Sobey Digital Technol Co Ltd, Chengdu 610041, Sichuan, Peoples R China
[3] Shenzhen Cyberspace Lab, Shenzhen 518055, Peoples R China
[4] Deakin Univ, Sch Informat Technol, Burwood, Vic 3125, Australia
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Cross-modal retrieval; Latent common space; Adversarial learning; Multimodal discriminant analysis; Multimodal representation learning; MULTIVIEW; REPRESENTATION; FUSION;
D O I
10.1016/j.knosys.2019.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval aims to retrieve the pertinent samples across different modalities, which is important in numerous multimodal applications. It is challenging to correlate the multimodal data due to a large heterogeneous gap between distinct modalities. In this paper, we propose a Multimodal Adversarial Network (MAN) method to project the multimodal data into a common space wherein the similarities between different modalities can be directly computed by the same distance measurement. The proposed MAN consists of multiple modality-specific generators, a discriminator and a multimodal discriminant analysis (MDA) loss. With the adversarial learning, the generators are pitted against the discriminator to eliminate the cross-modal discrepancy. Furthermore, a novel MDA loss is proposed to preserve as much discrimination as possible into all available dimensions of the generated common representations. However, there are some problems in directly optimizing the MDA trace criterion. To be specific, the discriminant function will overemphasize 1) the large distances between already separated classes, 2) and the dominant eigenvalues. These problems may cause poor discrimination of the common representations. To solve these problems, we propose a between-class strategy and an eigenvalue strategy to weaken the largest between-class differences and the dominant eigenvalues, respectively. To the best of our knowledge, the proposed MAN could be one of the first works to specifically design for the multimodal representation learning (more than two modalities) with adversarial learning. To verify the effectiveness of the proposed method, extensive experiments are carried out on four widely-used multimodal databases comparing with 16 state-of-the-art approaches. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:38 / 50
页数:13
相关论文
共 50 条
  • [1] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [2] DEEP ADVERSARIAL QUANTIZATION NETWORK FOR CROSS-MODAL RETRIEVAL
    Zhou, Yu
    Feng, Yong
    Zhou, Mingliang
    Qiang, Baohua
    Hou, Leong U.
    Zhu, Jiajie
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4325 - 4329
  • [3] Adversarial Graph Convolutional Network for Cross-Modal Retrieval
    Dong, Xinfeng
    Liu, Li
    Zhu, Lei
    Nie, Liqiang
    Zhang, Huaxiang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1634 - 1645
  • [4] Information Aggregation Semantic Adversarial Network for Cross-Modal Retrieval
    Wang, Hongfei
    Feng, Aimin
    Liu, Xuejun
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [5] Adversarial Modality Alignment Network for Cross-Modal Molecule Retrieval
    Zhao, Wenyu
    Zhou, Dong
    Cao, Buqing
    Zhang, Kai
    Chen, Jinjun
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 278 - 289
  • [6] Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval
    Wu, Hongchang
    Guan, Ziyu
    Zhi, Tao
    zhao, Wei
    Xu, Cai
    Han, Hong
    Yang, Yarning
    [J]. 2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 265 - 272
  • [7] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
    Xu, Xing
    Song, Jingkuan
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Huang, Zi
    [J]. ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54
  • [8] MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval
    Huang, Xin
    Peng, Yuxin
    Yuan, Mingkuan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (03) : 1047 - 1059
  • [9] Dual discriminant adversarial cross-modal retrieval
    Pei He
    Meng Wang
    Ding Tu
    Zhuo Wang
    [J]. Applied Intelligence, 2023, 53 : 4257 - 4267
  • [10] Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval
    Liao, Lei
    Yang, Meng
    Zhang, Bob
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 920 - 934