MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding

被引：0

作者：

Li, Junlong ^{[1
]}

Xu, Yiheng ^{[2
]}

Cui, Lei ^{[2
]}

Wei, Furu ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital documents where the layout information is not fixed and needs to be interactively and dynamically rendered for visualization, making existing layout-based pre-training approaches not easy to apply. In this paper, we propose MarkupLM for document understanding tasks with markup languages as the backbone, such as HTML/XML-based documents, where text and markup information is jointly pre-trained. Experiment results show that the pre-trained MarkupLM significantly outperforms the existing strong baseline models on several document understanding tasks. The pre-trained model and code will be publicly available at https://aka.ms/markuplm.

引用

页码：6078 / 6087

页数：10

共 50 条

[1] LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Xu, Yiheng
Li, Minghao
Cui, Lei
Huang, Shaohan
Wei, Furu
Zhou, Ming
[J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1192 - 1200
[2] LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding
Xu, Yang
Xu, Yiheng
Lv, Tengchao
Cui, Lei
Wei, Furu
Wang, Guoxin
Lu, Yijuan
Florencio, Dinei
Zhang, Cha
Che, Wanxiang
Zhang, Min
Zhou, Lidong
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2579 - 2591
[3] Hierarchical Multimodal Pre-training for Visually RichWebpage Understanding
Xu, Hongshen
Chen, Lu
Zhao, Zihan
Ma, Da
Cao, Ruisheng
Zhu, Zichen
Yu, Kai
[J]. PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 864 - 872
[4] Pre-training for Abstractive Document Summarization by Reinstating Source Text
Zou, Yanyan
Zhang, Xingxing
Wei Lu
Furu Wei
Ming Zhou
[J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3646 - 3660
[5] PreSTU: Pre-Training for Scene-Text Understanding
Kil, Jihyung
Changpinyo, Soravit
Chen, Xi
Hu, Hexiang
Goodman, Sebastian
Chao, Wei-Lun
Soricut, Radu
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15224 - 15234
[6] MPNet: Masked and Permuted Pre-training for Language Understanding
Song, Kaitao
Tan, Xu
Qin, Tao
Lu, Jianfeng
Liu, Tie-Yan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[7] LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Tu, Yi
Guo, Ya
Chen, Huan
Tang, Jinyang
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15200 - 15212
[8] Unified Language Model Pre-training for Natural Language Understanding and Generation
Dong, Li
Yang, Nan
Wang, Wenhui
Wei, Furu
Liu, Xiaodong
Wang, Yu
Gao, Jianfeng
Zhou, Ming
Hon, Hsiao-Wuen
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[9] Self-training Improves Pre-training for Natural Language Understanding
Du, Jingfei
Grave, Edouard
Gunel, Beliz
Chaudhary, Vishrav
Celebi, Onur
Auli, Michael
Stoyanov, Veselin
Conneau, Alexis
[J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5408 - 5418
[10] Multimodal Pre-Training Based on Graph Attention Network for Document Understanding
Zhang, Zhenrong
Ma, Jiefeng
Du, Jun
Wang, Licheng
Zhang, Jianshu
[J]. IEEE Transactions on Multimedia, 2023, 25 : 6743 - 6755

← 1 2 3 4 5 →