Enhancing Visual Information Extraction with Large Language Models Through Layout-Aware Instruction Tuning

被引:0
|
作者
Li, Teng [1 ]
Wang, Jiapeng [1 ]
Jin, Lianwen [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual Information Extraction; Large Language Model; Instruction Tuning;
D O I
10.1007/978-981-97-8511-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, leveraging large language models (LLMs) for visually-rich document information extraction has made significant progress. Previous studies have simplified the task of visual information extraction into a document visual question answering task. This task involves a question-answer session that yields a single entity result at a time, serving as a means of validating the document understanding capabilities of large language models (LLMs). However, these methods encounter significant challenges in computational efficiency and cost when addressing the document digitization requirements for extracting multiple entities from a single document. This scenario is common in practical applications of visual information extraction. This paper builds upon large language model and incorporates document layout information through a document layout modeling branch. We also design a layout-aware and task-specific instruction set. To further enhance the model's proficiency in learning document layout information, we initially augment the tokenizer's vocabulary. Subsequently, the entire model undergoes fine-tuning to ensure improved adaptability to the expanded vocabulary and effective extraction of document layout features. By harnessing the exceptional language comprehension capabilities of LLMs, our model is capable of executing comprehensive entity extraction for an entire document in a single pass. Benefiting from the characteristics of generative large language models, we can accomplish multiple downstream tasks of visual information extraction using an individual model. Our experimental results demonstrate consistent improvement over the baseline model across a range of document visual information extraction tasks.
引用
收藏
页码:276 / 289
页数:14
相关论文
共 50 条
  • [31] Enhancing Large Language Models Through External Domain Knowledge
    Welz, Laslo
    Lanquillon, Carsten
    ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 135 - 146
  • [32] Enhancing Relation Extraction from Biomedical Texts by Large Language Models
    Asada, Masaki
    Fukuda, Ken
    ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 3 - 14
  • [33] Model tuning or prompt Tuning? a study of large language models for clinical concept and relation extraction
    Peng, Cheng
    Yang, Xi
    Smith, Kaleb E.
    Yu, Zehao
    Chen, Aokun
    Bian, Jiang
    Wu, Yonghui
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 153
  • [34] Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
    Li, Xin
    Wu, Yunfei
    Jiang, Xinghua
    Guo, Zhihao
    Gong, Mingming
    Cao, Haoyu
    Liu, Yinsong
    Jiang, Deqiang
    Sun, Xing
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15546 - 15555
  • [35] Scaling and Adapting Large Language Models for Portuguese Open Information Extraction: A Comparative Study of Fine-Tuning and LoRA
    Melo, Alan
    Cabral, Bruno
    Claro, Daniela Barreiro
    INTELLIGENT SYSTEMS, BRACIS 2024, PT III, 2025, 15414 : 427 - 441
  • [36] A Comparative Analysis of Instruction Fine-Tuning Large Language Models for Financial Text Classification
    Fatemi, Sorouralsadat
    Hu, Yuheng
    Mousavi, Maryam
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2025, 16 (01)
  • [37] JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning
    Sukeda, Issey
    Suzuki, Masahiro
    Kodera, Satoshi
    Sakaji, Hiroki
    arXiv, 2023,
  • [38] Enhancing pixel-level analysis in medical imaging through visual instruction tuning: introducing PLAMi
    Bai, Maocheng
    Yu, Xiaosheng
    Wang, Ying
    Chen, Jubo
    Zhang, Xiaofeng
    Lyu, Pengfei
    VISUAL COMPUTER, 2024,
  • [39] Structured information extraction from scientific text with large language models
    John Dagdelen
    Alexander Dunn
    Sanghoon Lee
    Nicholas Walker
    Andrew S. Rosen
    Gerbrand Ceder
    Kristin A. Persson
    Anubhav Jain
    Nature Communications, 15
  • [40] Exploring Large Language Models for Low-Resource IT Information Extraction
    Bhavya, Bhavya
    Isaza, Paulina Toro
    Deng, Yu
    Nidd, Michael
    Azad, Amar Prakash
    Shwartz, Larisa
    Zhai, ChengXiang
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1203 - 1212