Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval

被引：2

作者：

Wu, Hongchang ^{[1
]}

Guan, Ziyu ^{[2
]}

Zhi, Tao ^{[3
]}

zhao, Wei ^{[1
]}

Xu, Cai ^{[2
]}

Han, Hong ^{[2
]}

Yang, Yarning ^{[2
]}

机构：

[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Peoples R China

[2] Xidian Univ, Xian, Peoples R China

[3] Xidian Univ, Sch Artificial Intelligence, Xian, Peoples R China

来源：

2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019) | 2019年

关键词：

Cross-modal retrieval; graph attention; self attention; generative adversarial network;

D O I：

10.1109/ICBK.2019.00043

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Existing cross-modal retrieval methods are mainly constrained to the bimodal case. When applied to the multi-modal case, we need to train O(K-2) (K: number of modalities) separate models, which is inefficient and unable to exploit common information among multiple modalities. Though some studies focused on learning a common space of multiple modalities for retrieval, they assumed data to be i.i.d. and failed to learn the underlying semantic structure which could be important for retrieval. To tackle this issue, we propose an extensive Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval (AGAT). AGAT synthesizes a self-attention network (SAT), a graph attention network (GAT) and a multi-modal generative adversarial network (MGAN). The SAT generates high-level embeddings for data items from different modalities, with self-attention capturing feature-level correlations in each modality. The GAT then uses attention to aggregate embeddings of matched items from different modalities to build a common embedding space. The MGAN aims to "cluster" matched embeddings of different modalities in the common space by forcing them to be similar to the aggregation. Finally, we train the common space so that it captures the semantic structure by constraining within-class/between-class distances. Experiments on three datasets show the effectiveness of AGAT.

引用

页码：265 / 272

页数：8

共 50 条

[31] Adversarial Cross-modal Domain Adaptation for Multi-modal Semantic Segmentation in Autonomous Driving
Shi, Mengqi
Cao, Haozhi
Xie, Lihua
Yang, Jianfei
2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 850 - 855
[32] Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval
Zhang, Xi
Lai, Hanjiang
Feng, Jiashi
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 614 - 629
[33] Multi-level adversarial attention cross-modal hashing
Wang, Benhui
Zhang, Huaxiang
Zhu, Lei
Nie, Liqiang
Liu, Li
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
[34] A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval
Williams-Lekuona, Mikel
Cosma, Georgina
Phillips, Iain
JOURNAL OF IMAGING, 2022, 8 (12)
[35] Dual discriminant adversarial cross-modal retrieval
Pei He
Meng Wang
Ding Tu
Zhuo Wang
Applied Intelligence, 2023, 53 : 4257 - 4267
[36] Dual discriminant adversarial cross-modal retrieval
He, Pei
Wang, Meng
Tu, Ding
Wang, Zhuo
APPLIED INTELLIGENCE, 2023, 53 (04) : 4257 - 4267
[37] Augmented Adversarial Training for Cross-Modal Retrieval
Wu, Yiling
Wang, Shuhui
Song, Guoli
Huang, Qingming
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 559 - 571
[38] Robust Cross-Modal Retrieval by Adversarial Training
Zhang, Tao
Sun, Shiliang
Zhao, Jing
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[39] Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval
Zou, Zhuoyang
Zhu, Xinghui
Zhu, Qinying
Zhang, Hongyan
Zhu, Lei
FOODS, 2024, 13 (11)
[40] Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval
Li, Chuang
Fei, Lunke
Kang, Peipei
Liang, Jiahao
Fang, Xiaozhao
Teng, Shaohua
PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 459 - 472

← 1 2 3 4 5 →