Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction

被引:5
|
作者
He, Wentao [1 ]
Ma, Hanjie [1 ]
Li, Shaohua [1 ]
Dong, Hui [2 ]
Zhang, Haixiang [1 ]
Feng, Jie [1 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Comp Sci & Technol, Hangzhou 310018, Peoples R China
[2] Hangzhou Codvis Technol Co Ltd, Hangzhou 311100, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 22期
关键词
multimodal relation extraction; small multimodal guidance; multimodal relation data augmentation; flexible threshold loss; large language model;
D O I
10.3390/app132212208
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Multimodal Relation Extraction (MRE) is a core task for constructing Multimodal Knowledge images (MKGs). Most current research is based on fine-tuning small-scale single-modal image and text pre-trained models, but we find that image-text datasets from network media suffer from data scarcity, simple text data, and abstract image information, which requires a lot of external knowledge for supplementation and reasoning. We use Multimodal Relation Data augmentation (MRDA) to address the data scarcity problem in MRE, and propose a Flexible Threshold Loss (FTL) to handle the imbalanced entity pair distribution and long-tailed classes. After obtaining prompt information from the small model as a guide model, we employ a Large Language Model (LLM) as a knowledge engine to acquire common sense and reasoning abilities. Notably, both stages of our framework are flexibly replaceable, with the first stage adapting to multimodal related classification tasks for small models, and the second stage replaceable by more powerful LLMs. Through experiments, our EMRE2llm model framework achieves state-of-the-art performance on the challenging MNRE dataset, reaching an 82.95% F1 score on the test set.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Large Language Models Empower Multimodal Integrated Sensing and Communication
    Cheng, Lu
    Zhang, Hongliang
    Di, Boya
    Niyato, Dusit
    Song, Lingyang
    IEEE COMMUNICATIONS MAGAZINE, 2025,
  • [42] Enhancing Urban Walkability Assessment with Multimodal Large Language Models
    Blecic, Ivan
    Saiu, Valeria
    Trunfio, Giuseppe A.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024 WORKSHOPS, PT V, 2024, 14819 : 394 - 411
  • [43] Multimodal Large Language Models as Built Environment Auditing Tools
    Jang, Kee Moon
    Kim, Junghwan
    PROFESSIONAL GEOGRAPHER, 2025, 77 (01): : 84 - 90
  • [44] QueryMintAI: Multipurpose Multimodal Large Language Models for Personal Data
    Ghosh, Ananya
    Deepa, K.
    IEEE ACCESS, 2024, 12 : 144631 - 144651
  • [45] UniCode: Learning a Unified Codebook for Multimodal Large Language Models
    Zheng, Sipeng
    Zhou, Bohan
    Feng, Yicheng
    Wang, Ye
    Lu, Zongqing
    COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 426 - 443
  • [46] BLINK: Multimodal Large Language Models Can See but Not Perceive
    Fu, Xingyu
    Hu, Yushi
    Li, Bangzheng
    Feng, Yu
    Wang, Haoyu
    Lin, Xudong
    Roth, Dan
    Smith, Noah A.
    Ma, Wei-Chiu
    Krishna, Ranjay
    COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 148 - 166
  • [47] Generating Images with Multimodal Language Models
    Koh, Jing Yu
    Fried, Daniel
    Salakhutdinov, Ruslan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Harnessing multimodal approaches for depression detection using large language models and facial expressions
    Misha Sadeghi
    Robert Richer
    Bernhard Egger
    Lena Schindler-Gmelch
    Lydia Helene Rupp
    Farnaz Rahimi
    Matthias Berking
    Bjoern M. Eskofier
    npj Mental Health Research, 3 (1):
  • [49] A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
    Wu, Tianhe
    Ma, Kede
    Liang, Jie
    Yang, Yujiu
    Zhang, Lei
    COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 143 - 160
  • [50] Incorporating Molecular Knowledge in Large Language Models via Multimodal Modeling
    Yang, Zekun
    Lv, Kun
    Shu, Jian
    Li, Zheng
    Xiao, Ping
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2025,