Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval

被引:2
|
作者
Wu, Hongchang [1 ]
Guan, Ziyu [2 ]
Zhi, Tao [3 ]
zhao, Wei [1 ]
Xu, Cai [2 ]
Han, Hong [2 ]
Yang, Yarning [2 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Xidian Univ, Xian, Peoples R China
[3] Xidian Univ, Sch Artificial Intelligence, Xian, Peoples R China
关键词
Cross-modal retrieval; graph attention; self attention; generative adversarial network;
D O I
10.1109/ICBK.2019.00043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K-2) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to "cluster" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
  • [41] Cross-modal discriminant adversarial network
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Lin, Jie
    Zhen, Liangli
    Wang, Wei
    Peng, Dezhong
    PATTERN RECOGNITION, 2021, 112
  • [42] Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval
    Bai, Cong
    Zeng, Chao
    Ma, Qing
    Zhang, Jinglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4756 - 4767
  • [43] Modality-Fused Graph Network for Cross-Modal Retrieval
    Wu, Fei
    LI, Shuaishuai
    Peng, Guangchuan
    Ma, Yongheng
    Jing, Xiao-Yuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 1094 - 1097
  • [44] BCAN: Bidirectional Correct Attention Network for Cross-Modal Retrieval
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Meng, Fanyang
    Liu, Mengyuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14247 - 14258
  • [45] Heterogeneous Attention Network for Effective and Efficient Cross-modal Retrieval
    Yu, Tan
    Yang, Yi
    Li, Yi
    Liu, Lin
    Fei, Hongliang
    Li, Ping
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1146 - 1156
  • [46] Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval
    Zou, Qiang
    Cheng, Shuli
    Du, Anyu
    Chen, Jiayi
    ENTROPY, 2024, 26 (11)
  • [47] Csan: cross-coupled semantic adversarial network for cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Meng, Fanzhen
    Gu, Guanghua
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (05)
  • [48] Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval
    Liao, Lei
    Yang, Meng
    Zhang, Bob
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 920 - 934
  • [49] Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval
    Ma, Xinhong
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3101 - 3114
  • [50] Cross-Modal Retrieval with Heterogeneous Graph Embedding
    Chen, Dapeng
    Wang, Min
    Chen, Haobin
    Wu, Lin
    Qin, Jing
    Peng, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300