Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

被引:0
|
作者
Hu, Xuming [1 ]
Guo, Zhijiang [2 ]
Teng, Zhiyang [3 ]
King, Irwin [4 ]
Yu, Philip S. [1 ,5 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Cambridge, Cambridge CB2 1TN, England
[3] Nanyang Technol Univ, Singapore, Singapore
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[5] Univ Illinois, Chicago, IL USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal relation extraction (MRE) is the task of identifying the semantic relationships between two entities based on the context of the sentence image pair. Existing retrieval-augmented approaches mainly focused on modeling the retrieved textual knowledge, but this may not be able to accurately identify complex relations. To improve the prediction, this research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We further develop a novel approach to synthesize the object-level, imagelevel, and sentence-level information for better reasoning between the same and different modalities. Extensive experiments and analyses show that the proposed method is able to effectively select and compare evidence across modalities and significantly outperforms state-of-the-art models. Code and data are available.
引用
收藏
页码:303 / 311
页数:9
相关论文
共 50 条
  • [1] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    [J]. PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [2] Multimodal adversarial network for cross-modal retrieval
    Hu, Peng
    Peng, Dezhong
    Wang, Xu
    Xiang, Yong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 180 : 38 - 50
  • [3] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [4] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
  • [5] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Peng, Xi
    Goh, Rick Siow Mong
    Zhou, Joey Tianyi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
  • [6] Scalable Deep Multimodal Learning for Cross-Modal Retrieval
    Hu, Peng
    Zhen, Liangli
    Peng, Dezhong
    Liu, Pei
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 635 - 644
  • [7] Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
    Shukor, Mustafa
    Couairon, Guillaume
    Grechka, Asya
    Cord, Matthieu
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4566 - 4577
  • [8] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
    Tian, Yu
    Yang, Wenjing
    Liu, Qingsong
    Yang, Qiong
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [9] Cross-Modal Retrieval using Random Multimodal Deep Learning
    Somasekar, Hemanth
    Naveen, Kavya
    [J]. JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (02): : 185 - 200
  • [10] Multimodal Multiclass Boosting and its Application to Cross-modal Retrieval
    Wang, Shixun
    Dou, Zhi
    Chen, Deng
    Yu, Hairong
    Li, Yuan
    Pan, Peng
    [J]. NEUROCOMPUTING, 2019, 357 : 11 - 23