GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

被引：22

作者：

Luo, Chuwei ^{[1
]}

Cheng, Changxu ^{[1
]}

Zheng, Qi ^{[1
]}

Yao, Cong ^{[1
]}

机构：

[1] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00685

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (e.g., the F1 score of RE on FUNSD is boosted from 80.35% to 89.45%)(1).

引用

页码：7092 / 7101

页数：10

共 50 条

[31] Silver Syntax Pre-training for Cross-Domain Relation Extraction
Bassignana, Elisa
Ginter, Filip
Pyysalo, Sampo
van der Goot, Rob
Plank, Barbara
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6984 - 6993
[32] Cross-lingual Visual Pre-training for Multimodal Machine Translation
Caglayan, Ozan
Kuyu, Menekse
Amac, Mustafa Sercan
Madhyastha, Pranava
Erdem, Erkut
Erdem, Aykut
Specia, Lucia
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1317 - 1324
[33] Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
Su, Tongkun
Li, Jun
Zhang, Xi
Jin, Haibo
Chen, Hao
Wang, Qiong
Lv, Faqin
Zhao, Baoliang
Hu, Ying
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IV, 2024, 15004 : 602 - 612
[34] Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Jing, Ya
Zhu, Xuelin
Liu, Xingbin
Sima, Qie
Yang, Taozheng
Feng, Yunhai
Kong, Tao
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 11390 - 11395
[35] UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
Li, Zhaowen
Zhu, Yousong
Yang, Fan
Li, Wei
Zhao, Chaoyang
Chen, Yingying
Chen, Zhiyang
Xie, Jiahao
Wu, Liwei
Zhao, Rui
Tang, Ming
Wang, Jinqiao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14607 - 14616
[36] Zero-shot Key Information Extraction from Mixed-Style Tables: Pre-training on Wikipedia
Yang, Qingping
Hu, Yingpeng
Cao, Rongyu
Li, Hongwei
Luo, Ping
2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1451 - 1456
[37] Zero-shot Key Information Extraction from Mixed-Style Tables: Pre-training on Wikipedia
Yang, Qingping
Hu, Yingpeng
Cao, Rongyu
Li, Hongwei
Luo, Ping
Proceedings - IEEE International Conference on Data Mining, ICDM, 2021, 2021-December : 1451 - 1456
[38] Masked Feature Prediction for Self-Supervised Visual Pre-Training
Wei, Chen
Fan, Haoqi
Xie, Saining
Wu, Chao-Yuan
Yuille, Alan
Feichtenhofer, Christoph
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
[39] Pre-training to Match for Unified Low-shot Relation Extraction
Liu, Fangchao
Lin, Hongyu
Han, Xianpei
Cao, Boxi
Sun, Le
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5785 - 5795
[40] Learning to See before Learning to Act: Visual Pre-training for Manipulation
Lin Yen-Chen
Zeng, Andy
Song, Shuran
Isola, Phillip
Lin, Tsung-Yi
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7286 - 7293

← 1 2 3 4 5 →