Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction

被引：5

作者：

He, Wentao ^{[1
]}

Ma, Hanjie ^{[1
]}

Li, Shaohua ^{[1
]}

Dong, Hui ^{[2
]}

Zhang, Haixiang ^{[1
]}

Feng, Jie ^{[1
]}

机构：

[1] Zhejiang Sci Tech Univ, Sch Comp Sci & Technol, Hangzhou 310018, Peoples R China

[2] Hangzhou Codvis Technol Co Ltd, Hangzhou 311100, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 22期

关键词：

multimodal relation extraction; small multimodal guidance; multimodal relation data augmentation; flexible threshold loss; large language model;

D O I：

10.3390/app132212208

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Multimodal Relation Extraction (MRE) is a core task for constructing Multimodal Knowledge images (MKGs). Most current research is based on fine-tuning small-scale single-modal image and text pre-trained models, but we find that image-text datasets from network media suffer from data scarcity, simple text data, and abstract image information, which requires a lot of external knowledge for supplementation and reasoning. We use Multimodal Relation Data augmentation (MRDA) to address the data scarcity problem in MRE, and propose a Flexible Threshold Loss (FTL) to handle the imbalanced entity pair distribution and long-tailed classes. After obtaining prompt information from the small model as a guide model, we employ a Large Language Model (LLM) as a knowledge engine to acquire common sense and reasoning abilities. Notably, both stages of our framework are flexibly replaceable, with the first stage adapting to multimodal related classification tasks for small models, and the second stage replaceable by more powerful LLMs. Through experiments, our EMRE2llm model framework achieves state-of-the-art performance on the challenging MNRE dataset, reaching an 82.95% F1 score on the test set.

引用

页数：14

共 50 条

[1] Instruction Tuning Large Language Models for Multimodal Relation Extraction Using LoRA
Li, Zou
Pang, Ning
Zhao, Xiang
WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 364 - 376
[2] A survey on multimodal large language models
Yin, Shukang
Fu, Chaoyou
Zhao, Sirui
Li, Ke
Sun, Xing
Xu, Tong
Chen, Enhong
NATIONAL SCIENCE REVIEW, 2024, 11 (12)
[3] A survey on multimodal large language models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
National Science Review, 2024, 11 (12) : 277 - 296
[4] From Large Language Models to Large Multimodal Models: A Literature Review
Huang, Dawei
Yan, Chuan
Li, Qing
Peng, Xiaojiang
APPLIED SCIENCES-BASEL, 2024, 14 (12):
[5] A comprehensive survey of large language models and multimodal large models in medicine
Xiao, Hanguang
Zhou, Feizhong
Liu, Xingyue
Liu, Tianqi
Li, Zhipeng
Liu, Xin
Huang, Xiaoxuan
INFORMATION FUSION, 2025, 117
[6] InteraRec: Interactive Recommendations Using Multimodal Large Language Models
Karra, Saketh Reddy
Tulabandhula, Theja
TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA, 2024, 14658 : 32 - 43
[7] Multimodal Large Language Models in Vision and Ophthalmology
Lu, Zhiyong
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
[8] The application of multimodal large language models in medicine
Qiu, Jianing
Yuan, Wu
Lam, Kyle
LANCET REGIONAL HEALTH-WESTERN PACIFIC, 2024, 45
[9] Large language models and multimodal foundation models for precision oncology
Truhn, Daniel
Eckardt, Jan-Niklas
Ferber, Dyke
Kather, Jakob Nikolas
NPJ PRECISION ONCOLOGY, 2024, 8 (01)
[10] Visual cognition in multimodal large language models
Buschoff, Luca M. Schulze
Akata, Elif
Bethge, Matthias
Schulz, Eric
NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106

← 1 2 3 4 5 →