Enhancing Visual Information Extraction with Large Language Models Through Layout-Aware Instruction Tuning

被引:0
|
作者
Li, Teng [1 ]
Wang, Jiapeng [1 ]
Jin, Lianwen [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual Information Extraction; Large Language Model; Instruction Tuning;
D O I
10.1007/978-981-97-8511-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, leveraging large language models (LLMs) for visually-rich document information extraction has made significant progress. Previous studies have simplified the task of visual information extraction into a document visual question answering task. This task involves a question-answer session that yields a single entity result at a time, serving as a means of validating the document understanding capabilities of large language models (LLMs). However, these methods encounter significant challenges in computational efficiency and cost when addressing the document digitization requirements for extracting multiple entities from a single document. This scenario is common in practical applications of visual information extraction. This paper builds upon large language model and incorporates document layout information through a document layout modeling branch. We also design a layout-aware and task-specific instruction set. To further enhance the model's proficiency in learning document layout information, we initially augment the tokenizer's vocabulary. Subsequently, the entire model undergoes fine-tuning to ensure improved adaptability to the expanded vocabulary and effective extraction of document layout features. By harnessing the exceptional language comprehension capabilities of LLMs, our model is capable of executing comprehensive entity extraction for an entire document in a single pass. Benefiting from the characteristics of generative large language models, we can accomplish multiple downstream tasks of visual information extraction using an individual model. Our experimental results demonstrate consistent improvement over the baseline model across a range of document visual information extraction tasks.
引用
收藏
页码:276 / 289
页数:14
相关论文
共 50 条
  • [21] Exploring the new frontier of information extraction through large language models in urban analytics
    Crooks, Andrew
    Chen, Qingqing
    ENVIRONMENT AND PLANNING B-URBAN ANALYTICS AND CITY SCIENCE, 2024, 51 (03) : 565 - 569
  • [22] Advancing entity recognition in biomedicine via instruction tuning of large language models
    Keloth, Vipina K.
    Hu, Yan
    Xie, Qianqian
    Peng, Xueqing
    Wang, Yan
    Zheng, Andrew
    Selek, Melih
    Raja, Kalpana
    Wei, Chih Hsuan
    Jin, Qiao
    Lu, Zhiyong
    Chen, Qingyu
    Xu, Hua
    BIOINFORMATICS, 2024, 40 (04)
  • [23] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
    Zhang, Jinrui
    Wang, Teng
    Zhang, Haigang
    Lu, Ping
    Zheng, Feng
    COMPUTER VISION - ECCV 2024, PT XXXVII, 2025, 15095 : 196 - 213
  • [24] WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning
    Yu, Zhaojian
    Zhang, Xin
    Shang, Ning
    Huang, Yangyu
    Xu, Can
    Zhao, Yishujie
    Hu, Wenxiang
    Yin, Qiufeng
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5140 - 5153
  • [25] Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation
    Yang, Zeyuan
    Li, Peng
    Liu, Yang
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1751 - 1777
  • [26] Enhancing Large Language Models with RAG for Visual Language Navigation in Continuous Environments
    Bao, Xiaoan
    Lv, Zhiqiang
    Wu, Biao
    ELECTRONICS, 2025, 14 (05):
  • [27] Enhancing the assessment of large language models in medical information generation
    Leiwa, Aher K.
    Lhusseiny, Bdelrahman M.
    OPHTHALMOLOGY RETINA, 2024, 8 (05): : e15 - e15
  • [28] Automatic bridge inspection database construction through hybrid information extraction and large language models
    Zhang, Chenhong
    Lei, Xiaoming
    Xia, Ye
    Sun, Limin
    DEVELOPMENTS IN THE BUILT ENVIRONMENT, 2024, 20
  • [29] Enhancing Chinese Essay Discourse Logic Evaluation Through Optimized Fine-Tuning of Large Language Models
    Song, Jinwang
    Song, Yanxin
    Zhou, Guangyu
    Fu, Wenhui
    Zhang, Kunli
    Zan, Hongying
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024, 2025, 15363 : 342 - 352
  • [30] Enhancing healthcare resource allocation through large language models
    Wan, Fang
    Wang, Kezhi
    Wang, Tao
    Qin, Hu
    Fondrevelle, Julien
    Duclos, Antoine
    SWARM AND EVOLUTIONARY COMPUTATION, 2025, 94