ERNIE-DOC: A Retrospective Long-Document Modeling Transformer

被引：0

作者：

Ding, Siyu ^{[1
]}

Shang, Junyuan ^{[1
]}

Wang, Shuohuan ^{[1
]}

Sun, Yu ^{[1
]}

Tian, Hao ^{[1
]}

Wu, Hua ^{[1
]}

Wang, Haifeng ^{[1
]}

机构：

[1] Baidu Inc, Beijing, Peoples R China

来源：

59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-DOC, a document-level language pretraining model based on Recurrence Transformers (Dai et al., 2019). Two welldesigned techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism, enable ERNIE-DOC 1, which has a much longer effective context length, to capture the contextual information of a complete document. We pretrain ERNIE-DOC to explicitly learn the relationships among segments with an additional document-aware segment-reordering objective. Various experiments were conducted on both English and Chinese document-level tasks. ERNIE- DOC improved the state-of-the-art language modeling result of perplexity to 16.8 on WikiText103. Moreover, it outperformed competitive pretraining models by a large margin on most language understanding tasks, such as text classification and question answering.

引用

页码：2914 / 2927

页数：14

共 29 条

[1] Efficient Memory-Enhanced Transformer for Long-Document Summarization in Low-Resource Regimes
Moro, Gianluca
Ragazzi, Luca
Valgimigli, Lorenzo
Frisoni, Giacomo
Sartori, Claudio
Marfia, Gustavo
SENSORS, 2023, 23 (07)
[2] Segmented Summarization and Refinement: A Pipeline for Long-Document Analysis on Social Media
Wang, Guanghua
Garg, Priyanshi
Wu, Weili
Journal of Social Computing, 2024, 5 (02): : 132 - 144
[3] Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
Wu, Chuhan
Wu, Fangzhao
Qi, Tao
Huang, Yongfeng
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 848 - 853
[4] Doc-Former: A transformer-based document shadow denoising network
Pei, Shengchang
Liu, Jun
Yi, Niannian
Zhang, Yun
Liu, Zhengtao
Chen, Zengyan
2023 THE 6TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA 2023, 2023, : 139 - 143
[5] Hierarchical Attention Transformer Networks for Long Document Classification
Hu, Yongli
Chen, Puman
Liu, Tengfei
Gao, Junbin
Sun, Yanfeng
Yin, Baocai
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[6] Long Document Ranking with Query-Directed Sparse Transformer
Jiang, Jyun-Yu
Xiong, Chenyan
Lee, Chia-Jung
Wang, Wei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4594 - 4605
[7] HM-Transformer: Hierarchical Multi-modal Transformer for Long Document Image Understanding
Deng, Xi
Li, Shasha
Yu, Jie
Ma, Jun
WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 232 - 245
[8] Socialformer: Social Network Inspired Long Document Modeling for Document Ranking
Zhou, Yujia
Dou, Zhicheng
Yuan, Huaying
Ma, Zhengyi
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 339 - 347
[9] Poolingformer: Long Document Modeling with Pooling Attention
Zhang, Hang
Gong, Yeyun
Shen, Yelong
Li, Weisheng
Lv, Jiancheng
Duan, Nan
Chen, Weizhu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] Globalizing BERT-based Transformer Architectures for Long Document Summarization
Grail, Quentin
Perez, Julien
Gaussier, Eric
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1792 - 1810

← 1 2 3 →