Adapting Pretrained Text-to-Text Models for Long Text Sequences

被引:0
|
作者
Xiong, Wenhan [1 ]
Gupta, Anchit [1 ]
Toshniwal, Shubham [1 ]
Mehdad, Yashar [1 ]
Yih, Wen-tau [1 ]
机构
[1] Meta AI, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline - model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in transformers with pooling-augmented blockwise attention, and pretrain the model with a masked-span prediction task with spans of varying lengths. In terms of the pretraining corpus, we find that using randomly concatenated short-documents from a large open-domain corpus results in better performance than using existing long document corpora, which are typically limited in their domain coverage. With these findings, we build a long-context model that achieves competitive performance on long-text QA tasks and establishes the new state of the art on five long-text summarization datasets, often outperforming previous methods with larger model sizes.
引用
收藏
页码:5566 / 5578
页数:13
相关论文
共 50 条
  • [31] Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model
    Pal, Kuntal Kumar
    Baral, Chitta
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3095 - 3101
  • [32] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    Raffel, Colin
    Shazeer, Noam
    Roberts, Adam
    Lee, Katherine
    Narang, Sharan
    Matena, Michael
    Zhou, Yanqi
    Li, Wei
    Liu, Peter J.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [33] LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
    Bulat, Adrian
    Tzimiropoulos, Georgios
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23232 - 23241
  • [34] Generative large language models are all-purpose text analytics engines: text-to-text learning is all your need
    Peng, Cheng
    Yang, Xi
    Chen, Aokun
    Yu, Zehao
    Smith, Kaleb E.
    Costa, Anthony B.
    Flores, Mona G.
    Bian, Jiang
    Wu, Yonghui
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1892 - 1903
  • [35] An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
    Albitar, Shereen
    Fournier, Sebastien
    Espinasse, Bernard
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 105 - 114
  • [36] The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
    Guo, Bao
    Liu, Mengge
    Zhang, Wen
    Chen, Hexuan
    Mu, Chang
    Li, Xiang
    Cui, Jianwei
    Wang, Bin
    Guo, Yuhang
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 216 - 224
  • [37] Transferring General Multimodal Pretrained Models to Text Recognition
    Lin, Junyang
    Ren, Xuancheng
    Zhang, Yichang
    Liu, Gao
    Wang, Peng
    Yang, An
    Zhou, Chang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 588 - 597
  • [38] Reinforcing Pretrained Models for Generating Attractive Text Advertisements
    Wang, Xiting
    Gu, Xinwei
    Cao, Jie
    Zhao, Zihua
    Yan, Yulan
    Middha, Bhuvan
    Xie, Xing
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3697 - 3707
  • [39] Table-to-Text Generation With Pretrained Diffusion Models
    Krylov, Aleksei S.
    Somov, Oleg D.
    IEEE ACCESS, 2024, 12 : 110517 - 110525
  • [40] Text-to-Text Surface Realisation Using Dependency-Tree Replacement
    de Novais, Eder Miranda
    Tadeu, Thiago Dias
    Paraboni, Ivandre
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 326 - 335