ERNIE-DOC: A Retrospective Long-Document Modeling Transformer

被引:0
|
作者
Ding, Siyu [1 ]
Shang, Junyuan [1 ]
Wang, Shuohuan [1 ]
Sun, Yu [1 ]
Tian, Hao [1 ]
Wu, Hua [1 ]
Wang, Haifeng [1 ]
机构
[1] Baidu Inc, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-DOC, a document-level language pretraining model based on Recurrence Transformers (Dai et al., 2019). Two welldesigned techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism, enable ERNIE-DOC 1, which has a much longer effective context length, to capture the contextual information of a complete document. We pretrain ERNIE-DOC to explicitly learn the relationships among segments with an additional document-aware segment-reordering objective. Various experiments were conducted on both English and Chinese document-level tasks. ERNIE- DOC improved the state-of-the-art language modeling result of perplexity to 16.8 on WikiText103. Moreover, it outperformed competitive pretraining models by a large margin on most language understanding tasks, such as text classification and question answering.
引用
收藏
页码:2914 / 2927
页数:14
相关论文
共 29 条
  • [1] Efficient Memory-Enhanced Transformer for Long-Document Summarization in Low-Resource Regimes
    Moro, Gianluca
    Ragazzi, Luca
    Valgimigli, Lorenzo
    Frisoni, Giacomo
    Sartori, Claudio
    Marfia, Gustavo
    SENSORS, 2023, 23 (07)
  • [2] Segmented Summarization and Refinement: A Pipeline for Long-Document Analysis on Social Media
    Wang, Guanghua
    Garg, Priyanshi
    Wu, Weili
    Journal of Social Computing, 2024, 5 (02): : 132 - 144
  • [3] Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
    Wu, Chuhan
    Wu, Fangzhao
    Qi, Tao
    Huang, Yongfeng
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 848 - 853
  • [4] Doc-Former: A transformer-based document shadow denoising network
    Pei, Shengchang
    Liu, Jun
    Yi, Niannian
    Zhang, Yun
    Liu, Zhengtao
    Chen, Zengyan
    2023 THE 6TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA 2023, 2023, : 139 - 143
  • [5] Hierarchical Attention Transformer Networks for Long Document Classification
    Hu, Yongli
    Chen, Puman
    Liu, Tengfei
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] Long Document Ranking with Query-Directed Sparse Transformer
    Jiang, Jyun-Yu
    Xiong, Chenyan
    Lee, Chia-Jung
    Wang, Wei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4594 - 4605
  • [7] HM-Transformer: Hierarchical Multi-modal Transformer for Long Document Image Understanding
    Deng, Xi
    Li, Shasha
    Yu, Jie
    Ma, Jun
    WEB AND BIG DATA, PT IV, APWEB-WAIM 2023, 2024, 14334 : 232 - 245
  • [8] Socialformer: Social Network Inspired Long Document Modeling for Document Ranking
    Zhou, Yujia
    Dou, Zhicheng
    Yuan, Huaying
    Ma, Zhengyi
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 339 - 347
  • [9] Poolingformer: Long Document Modeling with Pooling Attention
    Zhang, Hang
    Gong, Yeyun
    Shen, Yelong
    Li, Weisheng
    Lv, Jiancheng
    Duan, Nan
    Chen, Weizhu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] Globalizing BERT-based Transformer Architectures for Long Document Summarization
    Grail, Quentin
    Perez, Julien
    Gaussier, Eric
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1792 - 1810