Visual Relation Extraction via Multi-modal Translation Embedding Based Model

被引:0
|
作者
Li, Zhichao [1 ]
Han, Yuping [1 ]
Xu, Yajing [1 ]
Gao, Sheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
Visual relation extraction; Multi-modal network; Translation embedding;
D O I
10.1007/978-3-319-93034-3_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual relation, such as "person holds dog" is an effective semantic unit for image understanding, as well as a bridge to connect computer vision and natural language. Recent work has been proposed to extract the object features in the image with the aid of respective textual description. However, very little work has been done to combine the multi-modal information to model the subject-predicate-object relation triplets to obtain deeper scene understanding. In this paper, we propose a novel visual relation extraction model named Multi-modal Translation Embedding Based Model to integrate the visual information and respective textual knowledge base. For that, our proposed model places objects of the image as well as their semantic relationships in two different low-dimensional spaces where the relation can be modeled as a simple translation vector to connect the entity descriptions in the knowledge graph. Moreover, we also propose a visual phrase learning method to capture the interactions between objects of the image to enhance the performance of visual relation extraction. Experiments are conducted on two real world datasets, which show that our proposed model can benefit from incorporating the language information into the relation embeddings and provide significant improvement compared to the state-of-the-art methods.
引用
收藏
页码:538 / 548
页数:11
相关论文
共 50 条
  • [21] Analyzing part functionality via multi-modal latent space embedding and interweaving
    Cui, Jiahao
    Li, Shuai
    Hou, Fei
    Hao, Aimin
    Qin, Hong
    COMPUTERS & GRAPHICS-UK, 2023, 115 : 1 - 12
  • [22] Hindi Visual Genome: A Dataset for Multi-Modal English to Hindi Machine Translation
    Parida, Shantipriya
    Bojar, Ondrej
    Dash, Satya Ranjan
    COMPUTACION Y SISTEMAS, 2019, 23 (04): : 1499 - 1505
  • [23] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
    Chao Wang
    Si-Jia Cai
    Bei-Xiang Shi
    Zhi-Hong Chong
    Journal of Computer Science and Technology, 2023, 38 : 1223 - 1236
  • [24] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
    Wang, Chao
    Cai, Si-Jia
    Shi, Bei-Xiang
    Chong, Zhi-Hong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (06) : 1223 - 1236
  • [25] A Unified MRC Framework with Multi-Query for Multi-modal Relation Triplets Extraction
    Chen, Qiang
    Zhang, Dong
    Li, Shoushan
    Zhou, Guodong
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 552 - 557
  • [26] A multi-modal extraction integrated model for neuropsychiatric disorders classification
    Liu, Liangliang
    Liu, Zhihong
    Chang, Jing
    Xu, Xu
    PATTERN RECOGNITION, 2024, 155
  • [27] HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment
    Peng, Ru
    Zeng, Yawen
    Zhao, Junbo
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 380 - 388
  • [28] Visual Prompt Multi-Modal Tracking
    Zhu, Jiawen
    Lai, Simiao
    Chen, Xin
    Wang, Dong
    Lu, Huchuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9516 - 9526
  • [29] VISUAL AS MULTI-MODAL ARGUMENTATION IN LAW
    Novak, Marko
    BRATISLAVA LAW REVIEW, 2021, 5 (01): : 91 - 110
  • [30] Product image extraction model construction based on multi-modal implicit measurement of unconsciousness
    Guo Z.
    Lin L.
    Yang M.
    Zhang Y.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (04): : 1150 - 1163