Cross-Modal Retrieval Augmentation for Multi-Modal Classification

被引：0

作者：

Gur, Shir ^{[1
,3
]}

Neverova, Natalia ^{[2
]}

Stauffer, Chris ^{[2
]}

Lim, Ser-Nam ^{[2
]}

Kiela, Douwe ^{[2
]}

Reiter, Austin ^{[2
]}

机构：

[1] Tel Aviv Univ, Tel Aviv, Israel

[2] Facebook AI, Menlo Pk, CA USA

[3] FAIR, Menlo Pk, CA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021 | 2021年

关键词：

KNOWLEDGE; LANGUAGE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in using retrieval components over external knowledge sources have shown impressive results for a variety of downstream tasks in natural language processing. Here, we explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering (VQA). First, we train a novel alignment model for embedding images and captions in the same space, which achieves substantial improvements in performance on image-caption retrieval w.r.t. similar methods. Second, we show that retrieval-augmented multi-modal transformers using the trained alignment model improve results on VQA over strong baselines. We further conduct extensive experiments to establish the promise of this approach, and examine novel applications for inference time such as hot-swapping indices.

引用

页码：111 / 123

页数：13

共 50 条

[1] Multi-modal and cross-modal for lecture videos retrieval
Nhu Van Nguyen
Coustaty, Mickal
Ogier, Jean-Marc
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
[2] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
Yu, Jun
Wu, Xiao-Jun
Zhang, Donglin
COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
[3] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
Jun Yu
Xiao-Jun Wu
Donglin Zhang
Cognitive Computation, 2022, 14 : 1159 - 1171
[4] Multi-modal semantic autoencoder for cross-modal retrieval
Wu, Yiling
Wang, Shuhui
Huang, Qingming
NEUROCOMPUTING, 2019, 331 : 165 - 175
[5] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval
Zeng, Yawen
Cao, Da
Wei, Xiaochi
Liu, Meng
Zhao, Zhou
Qin, Zheng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2215 - 2224
[6] Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval
Wu, Hongchang
Guan, Ziyu
Zhi, Tao
zhao, Wei
Xu, Cai
Han, Hong
Yang, Yarning
2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 265 - 272
[7] Multi-modal Subspace Learning with Dropout regularization for Cross-modal Recognition and Retrieval
Cao, Guanqun
Waris, Muhammad Adeel
Iosifidis, Alexandros
Gabbouj, Moncef
2016 SIXTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2016,
[8] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
Wang, Kaiye
Wang, Wei
He, Ran
Wang, Liang
Tan, Tieniu
2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240
[9] A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval
Williams-Lekuona, Mikel
Cosma, Georgina
Phillips, Iain
JOURNAL OF IMAGING, 2022, 8 (12)
[10] Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval
Zou, Zhuoyang
Zhu, Xinghui
Zhu, Qinying
Zhang, Hongyan
Zhu, Lei
FOODS, 2024, 13 (11)

← 1 2 3 4 5 →