Visual Relation Extraction via Multi-modal Translation Embedding Based Model

被引:0
|
作者
Li, Zhichao [1 ]
Han, Yuping [1 ]
Xu, Yajing [1 ]
Gao, Sheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
Visual relation extraction; Multi-modal network; Translation embedding;
D O I
10.1007/978-3-319-93034-3_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual relation, such as "person holds dog" is an effective semantic unit for image understanding, as well as a bridge to connect computer vision and natural language. Recent work has been proposed to extract the object features in the image with the aid of respective textual description. However, very little work has been done to combine the multi-modal information to model the subject-predicate-object relation triplets to obtain deeper scene understanding. In this paper, we propose a novel visual relation extraction model named Multi-modal Translation Embedding Based Model to integrate the visual information and respective textual knowledge base. For that, our proposed model places objects of the image as well as their semantic relationships in two different low-dimensional spaces where the relation can be modeled as a simple translation vector to connect the entity descriptions in the knowledge graph. Moreover, we also propose a visual phrase learning method to capture the interactions between objects of the image to enhance the performance of visual relation extraction. Experiments are conducted on two real world datasets, which show that our proposed model can benefit from incorporating the language information into the relation embeddings and provide significant improvement compared to the state-of-the-art methods.
引用
收藏
页码:538 / 548
页数:11
相关论文
共 50 条
  • [1] Multi-modal semantics fusion model for domain relation extraction via information bottleneck
    Tian, Zhao
    Zhao, Xuan
    Li, Xiwang
    Ma, Xiaoping
    Li, Yinghao
    Wang, Youwei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [2] TransFusion: Multi-Modal Fusion for Video Tag Inference via Translation-based Knowledge Embedding
    Jin, Di
    Qi, Zhongang
    Luo, Yingmin
    Shan, Ying
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1093 - 1101
  • [3] Video Visual Relation Detection via Multi-modal Feature Fusion
    Sun, Xu
    Ren, Tongwei
    Zi, Yuan
    Wu, Gangshan
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2657 - 2661
  • [4] A Chinese Multi-modal Relation Extraction Model for Internet Security of Finance
    Lai, Qinghan
    Ding, Shuai
    Gong, Jinghao
    Cui, Jin'an
    Liu, Song
    52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOP VOLUME (DSN-W 2022), 2022, : 123 - 128
  • [5] Latent Variable Model for Multi-modal Translation
    Calixto, Iacer
    Rios, Miguel
    Aziz, Wilker
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6392 - 6405
  • [6] Visual Agreement Regularized Training for Multi-Modal Machine Translation
    Yang, Pengcheng
    Chen, Boxing
    Zhang, Pei
    Sun, Xu
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9418 - 9425
  • [7] Adding visual attention into encoder-decoder model for multi-modal machine translation
    Xu, Chun
    Yu, Zhengqing
    Shi, Xiayang
    Chen, Fang
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [8] Visual Entity Linking via Multi-modal Learning
    Zheng, Qiushuo
    Wen, Hao
    Wang, Meng
    Qi, Guilin
    DATA INTELLIGENCE, 2022, 4 (01) : 1 - 19
  • [9] Metaknowledge Extraction Based on Multi-Modal Documents
    Liu, Shu-Kan
    Xu, Rui-Lin
    Geng, Bo-Ying
    Sun, Qiao
    Duan, Li
    Liu, Yi-Ming
    IEEE ACCESS, 2021, 9 : 50050 - 50060
  • [10] MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES
    Pan, Zexu
    Tao, Ruijie
    Xu, Chenglin
    Li, Haizhou
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6678 - 6682