Visual Relation Extraction via Multi-modal Translation Embedding Based Model

被引：0

作者：

Li, Zhichao ^{[1
]}

Han, Yuping ^{[1
]}

Xu, Yajing ^{[1
]}

Gao, Sheng ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT I | 2018年 / 10937卷

关键词：

Visual relation extraction; Multi-modal network; Translation embedding;

D O I：

10.1007/978-3-319-93034-3_43

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual relation, such as "person holds dog" is an effective semantic unit for image understanding, as well as a bridge to connect computer vision and natural language. Recent work has been proposed to extract the object features in the image with the aid of respective textual description. However, very little work has been done to combine the multi-modal information to model the subject-predicate-object relation triplets to obtain deeper scene understanding. In this paper, we propose a novel visual relation extraction model named Multi-modal Translation Embedding Based Model to integrate the visual information and respective textual knowledge base. For that, our proposed model places objects of the image as well as their semantic relationships in two different low-dimensional spaces where the relation can be modeled as a simple translation vector to connect the entity descriptions in the knowledge graph. Moreover, we also propose a visual phrase learning method to capture the interactions between objects of the image to enhance the performance of visual relation extraction. Experiments are conducted on two real world datasets, which show that our proposed model can benefit from incorporating the language information into the relation embeddings and provide significant improvement compared to the state-of-the-art methods.

引用

页码：538 / 548

页数：11

共 50 条

[21] Analyzing part functionality via multi-modal latent space embedding and interweaving
Cui, Jiahao
Li, Shuai
Hou, Fei
Hao, Aimin
Qin, Hong
COMPUTERS & GRAPHICS-UK, 2023, 115 : 1 - 12
[22] Hindi Visual Genome: A Dataset for Multi-Modal English to Hindi Machine Translation
Parida, Shantipriya
Bojar, Ondrej
Dash, Satya Ranjan
COMPUTACION Y SISTEMAS, 2019, 23 (04): : 1499 - 1505
[23] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
Chao Wang
Si-Jia Cai
Bei-Xiang Shi
Zhi-Hong Chong
Journal of Computer Science and Technology, 2023, 38 : 1223 - 1236
[24] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
Wang, Chao
Cai, Si-Jia
Shi, Bei-Xiang
Chong, Zhi-Hong
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (06) : 1223 - 1236
[25] A Unified MRC Framework with Multi-Query for Multi-modal Relation Triplets Extraction
Chen, Qiang
Zhang, Dong
Li, Shoushan
Zhou, Guodong
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 552 - 557
[26] A multi-modal extraction integrated model for neuropsychiatric disorders classification
Liu, Liangliang
Liu, Zhihong
Chang, Jing
Xu, Xu
PATTERN RECOGNITION, 2024, 155
[27] HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment
Peng, Ru
Zeng, Yawen
Zhao, Junbo
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 380 - 388
[28] Visual Prompt Multi-Modal Tracking
Zhu, Jiawen
Lai, Simiao
Chen, Xin
Wang, Dong
Lu, Huchuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9516 - 9526
[29] VISUAL AS MULTI-MODAL ARGUMENTATION IN LAW
Novak, Marko
BRATISLAVA LAW REVIEW, 2021, 5 (01): : 91 - 110
[30] Product image extraction model construction based on multi-modal implicit measurement of unconsciousness
Guo Z.
Lin L.
Yang M.
Zhang Y.
Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2022, 28 (04): : 1150 - 1163

← 1 2 3 4 5 →