GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

被引：22

作者：

Luo, Chuwei ^{[1
]}

Cheng, Changxu ^{[1
]}

Zheng, Qi ^{[1
]}

Yao, Cong ^{[1
]}

机构：

[1] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00685

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (e.g., the F1 score of RE on FUNSD is boosted from 80.35% to 89.45%)(1).

引用

页码：7092 / 7101

页数：10

共 50 条

[41] Learning to mask and permute visual tokens for Vision Transformer pre-training
Baraldi, Lorenzo
Amoroso, Roberto
Cornia, Marcella
Baraldi, Lorenzo
Pilzer, Andrea
Cucchiara, Rita
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 252
[42] Correlational Image Modeling for Self-Supervised Visual Pre-Training
Li, Wei
Xie, Jiahao
Loy, Chen Change
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15105 - 15115
[43] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Liu, Tongtong
Feng, Fangxiang
Wang, Xiaojie
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
[44] Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
Dong, Haoyu
Cheng, Zhoujun
He, Xinyi
Zhou, Mengyu
Zhou, Anda
Zhou, Fan
Liu, Ao
Han, Shi
Zhang, Dongmei
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 5426 - 5435
[45] Pre-Training of Deep Bidirectional Protein Sequence Representations With Structural Information
Min, Seonwoo
Park, Seunghyun
Kim, Siwon
Choi, Hyun-Soo
Lee, Byunghan
Yoon, Sungroh
IEEE ACCESS, 2021, 9 : 123912 - 123926
[46] Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
Lin, Zehui
Pan, Xiao
Wang, Mingxuan
Qiu, Xipeng
Feng, Jiangtao
Zhou, Hao
Li, Lei
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2649 - 2663
[47] Rethinking ImageNet Pre-training
He, Kaiming
Girshick, Ross
Dollar, Piotr
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4917 - 4926
[48] Photo Pre-Training, But for Sketch
Ke, L.
Pang, Kaiyue
Song, Yi-Zhe
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2754 - 2764
[49] Contrastive Learning With Enhancing Detailed Information for Pre-Training Vision Transformer
Liang, Zhuomin
Bai, Liang
Fan, Jinyu
Yang, Xian
Liang, Jiye
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 219 - 231
[50] Pre-Training to Learn in Context
Gu, Yuxian
Dong, Li
Wei, Furu
Huang, Minlie
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4849 - 4870

← 1 2 3 4 5 →