Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

被引：0

作者：

Hu, Xuming ^{[1
]}

Guo, Zhijiang ^{[2
]}

Teng, Zhiyang ^{[3
]}

King, Irwin ^{[4
]}

Yu, Philip S. ^{[1
,5
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Univ Cambridge, Cambridge CB2 1TN, England

[3] Nanyang Technol Univ, Singapore, Singapore

[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[5] Univ Illinois, Chicago, IL USA

来源：

61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal relation extraction (MRE) is the task of identifying the semantic relationships between two entities based on the context of the sentence image pair. Existing retrieval-augmented approaches mainly focused on modeling the retrieved textual knowledge, but this may not be able to accurately identify complex relations. To improve the prediction, this research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We further develop a novel approach to synthesize the object-level, imagelevel, and sentence-level information for better reasoning between the same and different modalities. Extensive experiments and analyses show that the proposed method is able to effectively select and compare evidence across modalities and significantly outperforms state-of-the-art models. Code and data are available.

引用

页码：303 / 311

页数：9

共 50 条

[1] Multimodal adversarial network for cross-modal retrieval
Hu, Peng
Peng, Dezhong
Wang, Xu
Xiang, Yong
KNOWLEDGE-BASED SYSTEMS, 2019, 180 : 38 - 50
[2] Multimodal Graph Learning for Cross-Modal Retrieval
Xie, Jingyou
Zhao, Zishuo
Lin, Zhenzhou
Shen, Ying
PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
[3] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
Fei, Hongliang
Yu, Tan
Li, Ping
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
[4] Deep Relation Embedding for Cross-Modal Retrieval
Zhang, Yifan
Zhou, Wengang
Wang, Min
Tian, Qi
Li, Houqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
[5] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
Zhen, Liangli
Hu, Peng
Peng, Xi
Goh, Rick Siow Mong
Zhou, Joey Tianyi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
[6] Scalable Deep Multimodal Learning for Cross-Modal Retrieval
Hu, Peng
Zhen, Liangli
Peng, Dezhong
Liu, Pei
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 635 - 644
[7] Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
Shukor, Mustafa
Couairon, Guillaume
Grechka, Asya
Cord, Matthieu
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4566 - 4577
[8] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
Tian, Yu
Yang, Wenjing
Liu, Qingsong
Yang, Qiong
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
[9] Cross-Modal Retrieval using Random Multimodal Deep Learning
Somasekar, Hemanth
Naveen, Kavya
JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (02): : 185 - 200
[10] Multimodal Multiclass Boosting and its Application to Cross-modal Retrieval
Wang, Shixun
Dou, Zhi
Chen, Deng
Yu, Hairong
Li, Yuan
Pan, Peng
NEUROCOMPUTING, 2019, 357 : 11 - 23

← 1 2 3 4 5 →