GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

被引：22

作者：

Luo, Chuwei ^{[1
]}

Cheng, Changxu ^{[1
]}

Zheng, Qi ^{[1
]}

Yao, Cong ^{[1
]}

机构：

[1] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00685

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (e.g., the F1 score of RE on FUNSD is boosted from 80.35% to 89.45%)(1).

引用

页码：7092 / 7101

页数：10

共 50 条

[21] Enhancing protein stability prediction with geometric learning and pre-training strategies
Li, Minghui
NATURE COMPUTATIONAL SCIENCE, 2024, 4 (11): : 807 - 808
[22] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision
Wan, Zhen
Cheng, Fei
Liu, Qianying
Mao, Zhuoyuan
Song, Haiyue
Kurohashi, Sadao
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2580 - 2585
[23] Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
Takashima, Sora
Hayamizu, Ryo
Inoue, Nakamasa
Kataoka, Hirokatsu
Yokota, Rio
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18579 - 18588
[24] Empowering crisis information extraction through actionability event schemata and domain-adaptive pre-training
Zhang, Yuhao
Lo, Siaw Ling
Myint, Phyo Yi Win
Information and Management, 2025, 62 (01):
[25] A Unified Visual Information Preservation Framework for Self-supervised Pre-Training in Medical Image Analysis
Zhou, Hong-Yu
Lu, Chixiang
Chen, Chaoqi
Yang, Sibei
Yu, Yizhou
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8020 - 8035
[26] PF-HIN:Pre-Training for Heterogeneous Information Networks
Fang, Yang
Zhao, Xiang
Chen, Yifan
Xiao, Weidong
de Rijke, Maarten
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8372 - 8385
[27] Pre-training Graph Transformer with Multimodal Side Information for Recommendation
Liu, Yong
Yang, Susen
Lei, Chenyi
Wang, Guoxin
Tang, Haihong
Zhang, Juyong
Sun, Aixin
Miao, Chunyan
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2853 - 2861
[28] Real-World Robot Learning with Masked Visual Pre-training
Radosavovic, Ilija
Xiao, Tete
James, Stephen
Abbeel, Pieter
Malik, Jitendra
Darrell, Trevor
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 416 - 426
[29] Object Adaptive Self-Supervised Dense Visual Pre-Training
Zhang, Yu
Zhang, Tao
Zhu, Hongyuan
Chen, Zihan
Mi, Siya
Peng, Xi
Geng, Xin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2228 - 2240
[30] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
Wang, Xinlong
Zhang, Rufeng
Shen, Chunhua
Kong, Tao
Li, Lei
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032

← 1 2 3 4 5 →