GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

被引:22
|
作者
Luo, Chuwei [1 ]
Cheng, Changxu [1 ]
Zheng, Qi [1 ]
Yao, Cong [1 ]
机构
[1] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00685
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (e.g., the F1 score of RE on FUNSD is boosted from 80.35% to 89.45%)(1).
引用
收藏
页码:7092 / 7101
页数:10
相关论文
共 50 条
  • [31] Silver Syntax Pre-training for Cross-Domain Relation Extraction
    Bassignana, Elisa
    Ginter, Filip
    Pyysalo, Sampo
    van der Goot, Rob
    Plank, Barbara
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6984 - 6993
  • [32] Cross-lingual Visual Pre-training for Multimodal Machine Translation
    Caglayan, Ozan
    Kuyu, Menekse
    Amac, Mustafa Sercan
    Madhyastha, Pranava
    Erdem, Erkut
    Erdem, Aykut
    Specia, Lucia
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1317 - 1324
  • [33] Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
    Su, Tongkun
    Li, Jun
    Zhang, Xi
    Jin, Haibo
    Chen, Hao
    Wang, Qiong
    Lv, Faqin
    Zhao, Baoliang
    Hu, Ying
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IV, 2024, 15004 : 602 - 612
  • [34] Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
    Jing, Ya
    Zhu, Xuelin
    Liu, Xingbin
    Sima, Qie
    Yang, Taozheng
    Feng, Yunhai
    Kong, Tao
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 11390 - 11395
  • [35] UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
    Li, Zhaowen
    Zhu, Yousong
    Yang, Fan
    Li, Wei
    Zhao, Chaoyang
    Chen, Yingying
    Chen, Zhiyang
    Xie, Jiahao
    Wu, Liwei
    Zhao, Rui
    Tang, Ming
    Wang, Jinqiao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14607 - 14616
  • [36] Zero-shot Key Information Extraction from Mixed-Style Tables: Pre-training on Wikipedia
    Yang, Qingping
    Hu, Yingpeng
    Cao, Rongyu
    Li, Hongwei
    Luo, Ping
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1451 - 1456
  • [37] Zero-shot Key Information Extraction from Mixed-Style Tables: Pre-training on Wikipedia
    Yang, Qingping
    Hu, Yingpeng
    Cao, Rongyu
    Li, Hongwei
    Luo, Ping
    Proceedings - IEEE International Conference on Data Mining, ICDM, 2021, 2021-December : 1451 - 1456
  • [38] Masked Feature Prediction for Self-Supervised Visual Pre-Training
    Wei, Chen
    Fan, Haoqi
    Xie, Saining
    Wu, Chao-Yuan
    Yuille, Alan
    Feichtenhofer, Christoph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
  • [39] Pre-training to Match for Unified Low-shot Relation Extraction
    Liu, Fangchao
    Lin, Hongyu
    Han, Xianpei
    Cao, Boxi
    Sun, Le
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5785 - 5795
  • [40] Learning to See before Learning to Act: Visual Pre-training for Manipulation
    Lin Yen-Chen
    Zeng, Andy
    Song, Shuran
    Isola, Phillip
    Lin, Tsung-Yi
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7286 - 7293