GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

被引:22
|
作者
Luo, Chuwei [1 ]
Cheng, Changxu [1 ]
Zheng, Qi [1 ]
Yao, Cong [1 ]
机构
[1] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00685
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (e.g., the F1 score of RE on FUNSD is boosted from 80.35% to 89.45%)(1).
引用
收藏
页码:7092 / 7101
页数:10
相关论文
共 50 条
  • [21] Enhancing protein stability prediction with geometric learning and pre-training strategies
    Li, Minghui
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (11): : 807 - 808
  • [22] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision
    Wan, Zhen
    Cheng, Fei
    Liu, Qianying
    Mao, Zhuoyuan
    Song, Haiyue
    Kurohashi, Sadao
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2580 - 2585
  • [23] Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
    Takashima, Sora
    Hayamizu, Ryo
    Inoue, Nakamasa
    Kataoka, Hirokatsu
    Yokota, Rio
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18579 - 18588
  • [24] Empowering crisis information extraction through actionability event schemata and domain-adaptive pre-training
    Zhang, Yuhao
    Lo, Siaw Ling
    Myint, Phyo Yi Win
    Information and Management, 2025, 62 (01):
  • [25] A Unified Visual Information Preservation Framework for Self-supervised Pre-Training in Medical Image Analysis
    Zhou, Hong-Yu
    Lu, Chixiang
    Chen, Chaoqi
    Yang, Sibei
    Yu, Yizhou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8020 - 8035
  • [26] PF-HIN:Pre-Training for Heterogeneous Information Networks
    Fang, Yang
    Zhao, Xiang
    Chen, Yifan
    Xiao, Weidong
    de Rijke, Maarten
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8372 - 8385
  • [27] Pre-training Graph Transformer with Multimodal Side Information for Recommendation
    Liu, Yong
    Yang, Susen
    Lei, Chenyi
    Wang, Guoxin
    Tang, Haihong
    Zhang, Juyong
    Sun, Aixin
    Miao, Chunyan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2853 - 2861
  • [28] Real-World Robot Learning with Masked Visual Pre-training
    Radosavovic, Ilija
    Xiao, Tete
    James, Stephen
    Abbeel, Pieter
    Malik, Jitendra
    Darrell, Trevor
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 416 - 426
  • [29] Object Adaptive Self-Supervised Dense Visual Pre-Training
    Zhang, Yu
    Zhang, Tao
    Zhu, Hongyuan
    Chen, Zihan
    Mi, Siya
    Peng, Xi
    Geng, Xin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2228 - 2240
  • [30] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032