Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval

被引:2
|
作者
Wu, Hongchang [1 ]
Guan, Ziyu [2 ]
Zhi, Tao [3 ]
zhao, Wei [1 ]
Xu, Cai [2 ]
Han, Hong [2 ]
Yang, Yarning [2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Xidian Univ, Xian, Peoples R China
[3] Xidian Univ, Sch Artificial Intelligence, Xian, Peoples R China
关键词
Cross-modal retrieval; graph attention; self attention; generative adversarial network;
D O I
10.1109/ICBK.2019.00043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K-2) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to "cluster" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
  • [31] Adversarial Cross-modal Domain Adaptation for Multi-modal Semantic Segmentation in Autonomous Driving
    Shi, Mengqi
    Cao, Haozhi
    Xie, Lihua
    Yang, Jianfei
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 850 - 855
  • [32] Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval
    Zhang, Xi
    Lai, Hanjiang
    Feng, Jiashi
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 614 - 629
  • [33] Multi-level adversarial attention cross-modal hashing
    Wang, Benhui
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
  • [34] A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval
    Williams-Lekuona, Mikel
    Cosma, Georgina
    Phillips, Iain
    JOURNAL OF IMAGING, 2022, 8 (12)
  • [35] Dual discriminant adversarial cross-modal retrieval
    Pei He
    Meng Wang
    Ding Tu
    Zhuo Wang
    Applied Intelligence, 2023, 53 : 4257 - 4267
  • [36] Dual discriminant adversarial cross-modal retrieval
    He, Pei
    Wang, Meng
    Tu, Ding
    Wang, Zhuo
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4257 - 4267
  • [37] Augmented Adversarial Training for Cross-Modal Retrieval
    Wu, Yiling
    Wang, Shuhui
    Song, Guoli
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 559 - 571
  • [38] Robust Cross-Modal Retrieval by Adversarial Training
    Zhang, Tao
    Sun, Shiliang
    Zhao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [39] Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval
    Zou, Zhuoyang
    Zhu, Xinghui
    Zhu, Qinying
    Zhang, Hongyan
    Zhu, Lei
    FOODS, 2024, 13 (11)
  • [40] Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval
    Li, Chuang
    Fei, Lunke
    Kang, Peipei
    Liang, Jiahao
    Fang, Xiaozhao
    Teng, Shaohua
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 459 - 472