GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

被引:22
|
作者
Luo, Chuwei [1 ]
Cheng, Changxu [1 ]
Zheng, Qi [1 ]
Yao, Cong [1 ]
机构
[1] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00685
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (e.g., the F1 score of RE on FUNSD is boosted from 80.35% to 89.45%)(1).
引用
收藏
页码:7092 / 7101
页数:10
相关论文
共 50 条
  • [1] Formula-Supervised Visual-Geometric Pre-training
    Yamada, Ryosuke
    Hara, Kensho
    Kataoka, Hirokatsu
    Makihara, Koshi
    Inoue, Nakamasa
    Yokota, Rio
    Satoh, Yutaka
    COMPUTER VISION-ECCV 2024, PT XXII, 2025, 15080 : 57 - 74
  • [2] Improving Information Extraction on Business Documents with Specific Pre-training Tasks
    Douzon, Thibault
    Duffner, Stefan
    Garcia, Christophe
    Espinas, Jeremy
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 111 - 125
  • [3] Quality Diversity for Visual Pre-Training
    Chavhan, Ruchika
    Gouk, Henry
    Li, Da
    Hospedales, Timothy
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5361 - 5371
  • [4] Pre-training Methods in Information Retrieval
    Fan, Yixing
    Xie, Xiaohui
    Cai, Yinqiong
    Chen, Jia
    Ma, Xinyu
    Li, Xiangsheng
    Zhang, Ruqing
    Guo, Jiafeng
    FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2022, 16 (03): : 178 - 317
  • [5] VILA: On Pre-training for Visual Language Models
    Lin, Ji
    Yin, Hongxu
    Ping, Wei
    Molchanov, Pavlo
    Shoeybi, Mohammad
    Han, Song
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
  • [6] A Method of Relation Extraction Using Pre-training Models
    Wang, Yu
    Sun, Yining
    Ma, Zuchang
    Gao, Lisheng
    Xu, Yang
    Wu, Yichen
    2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 176 - 179
  • [7] Visual Alignment Pre-training for Sign Language Translation
    Jiao, Peiqi
    Min, Yuecong
    Chen, Xilin
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 349 - 367
  • [8] Symbolizing Visual Features for Pre-training with Unlabeled Images
    Kamata, Yuichi
    Yamada, Moyuru
    Kato, Keizo
    Nakagawa, Akira
    Okatani, Takayuki
    PATTERN RECOGNITION, ACPR 2021, PT II, 2022, 13189 : 490 - 503
  • [9] Numerical Tuple Extraction from Tables with Pre-training
    Yang, Qingping
    Cao, Yixuan
    Luo, Ping
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2233 - 2241
  • [10] Learning Visual Prior via Generative Pre-Training
    Xie, Jinheng
    Ye, Kai
    Li, Yudong
    Li, Yuexiang
    Lin, Kevin Qinghong
    Zheng, Yefeng
    Shen, Linlin
    Shou, Mike Zheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,