Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

被引:0
|
作者
Hu, Xuming [1 ]
Guo, Zhijiang [2 ]
Teng, Zhiyang [3 ]
King, Irwin [4 ]
Yu, Philip S. [1 ,5 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Cambridge, Cambridge CB2 1TN, England
[3] Nanyang Technol Univ, Singapore, Singapore
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[5] Univ Illinois, Chicago, IL USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal relation extraction (MRE) is the task of identifying the semantic relationships between two entities based on the context of the sentence image pair. Existing retrieval-augmented approaches mainly focused on modeling the retrieved textual knowledge, but this may not be able to accurately identify complex relations. To improve the prediction, this research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We further develop a novel approach to synthesize the object-level, imagelevel, and sentence-level information for better reasoning between the same and different modalities. Extensive experiments and analyses show that the proposed method is able to effectively select and compare evidence across modalities and significantly outperforms state-of-the-art models. Code and data are available.
引用
收藏
页码:303 / 311
页数:9
相关论文
共 50 条
  • [1] Multimodal adversarial network for cross-modal retrieval
    Hu, Peng
    Peng, Dezhong
    Wang, Xu
    Xiang, Yong
    KNOWLEDGE-BASED SYSTEMS, 2019, 180 : 38 - 50
  • [2] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [3] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [4] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
  • [5] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Peng, Xi
    Goh, Rick Siow Mong
    Zhou, Joey Tianyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
  • [6] Scalable Deep Multimodal Learning for Cross-Modal Retrieval
    Hu, Peng
    Zhen, Liangli
    Peng, Dezhong
    Liu, Pei
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 635 - 644
  • [7] Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
    Shukor, Mustafa
    Couairon, Guillaume
    Grechka, Asya
    Cord, Matthieu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4566 - 4577
  • [8] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
    Tian, Yu
    Yang, Wenjing
    Liu, Qingsong
    Yang, Qiong
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [9] Cross-Modal Retrieval using Random Multimodal Deep Learning
    Somasekar, Hemanth
    Naveen, Kavya
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (02): : 185 - 200
  • [10] Multimodal Multiclass Boosting and its Application to Cross-modal Retrieval
    Wang, Shixun
    Dou, Zhi
    Chen, Deng
    Yu, Hairong
    Li, Yuan
    Pan, Peng
    NEUROCOMPUTING, 2019, 357 : 11 - 23