Adapting Pretrained Text-to-Text Models for Long Text Sequences

被引:0
|
作者
Xiong, Wenhan [1 ]
Gupta, Anchit [1 ]
Toshniwal, Shubham [1 ]
Mehdad, Yashar [1 ]
Yih, Wen-tau [1 ]
机构
[1] Meta AI, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline - model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in transformers with pooling-augmented blockwise attention, and pretrain the model with a masked-span prediction task with spans of varying lengths. In terms of the pretraining corpus, we find that using randomly concatenated short-documents from a large open-domain corpus results in better performance than using existing long document corpora, which are typically limited in their domain coverage. With these findings, we build a long-context model that achieves competitive performance on long-text QA tasks and establishes the new state of the art on five long-text summarization datasets, often outperforming previous methods with larger model sizes.
引用
收藏
页码:5566 / 5578
页数:13
相关论文
共 50 条
  • [21] Evaluation of Transfer Learning for Polish with a Text-to-Text Model
    Chrabrowa, Aleksandra
    Dragan, Lukasz
    Grzegorczyk, Karol
    Kajtoch, Dariusz
    Koszowski, Mikolaj
    Mroczkowski, Robert
    Rybak, Piotr
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4374 - 4394
  • [22] Text-to-Text Transfer Transformer Phrasing Model Using Enriched Text Input
    Rezackova, Marketa
    Matousek, Jindrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 389 - 400
  • [23] Product Titles-to-Attributes As a Text-to-Text Task
    Fuchs, Gilad
    Acriche, Yoni
    PROCEEDINGS OF THE 5TH WORKSHOP ON E-COMMERCE AND NLP (ECNLP 5), 2022, : 91 - 98
  • [24] A Text-to-Text Model for Multilingual Offensive Language Identification
    Ranasinghe, Tharindu
    Zampieri, Marcos
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 375 - 384
  • [25] Pretrained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4492 - 4499
  • [26] Text-to-text generative approach for enhanced complex word identification
    Sliwiak, Patrycja
    Shah, Syed Afaq Ali
    NEUROCOMPUTING, 2024, 610
  • [27] TESS: Text-to-Text Self-Conditioned Simplex Diffusion
    Mahabadi, Rabeeh Karimi
    Ivison, Hamish
    Tae, Jaesung
    Henderson, James
    Beltagy, Iz
    Peters, Matthew E.
    Cohan, Arman
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2347 - 2361
  • [28] Text-to-text machine translation using the RECONTRA connectionist model
    Castaño, MA
    Casacuberta, F
    ENGINEERING APPLICATIONS OF BIO-INSPIRED ARTIFICIAL NEURAL NETWORKS, VOL II, 1999, 1607 : 683 - 692
  • [29] TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
    Berabi, Berkay
    He, Jingxuan
    Raychev, Veselin
    Vechev, Martin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [30] Exploring the limits of transfer learning with a unified text-to-text transformer
    Raffel, Colin
    Shazeer, Noam
    Roberts, Adam
    Lee, Katherine
    Narang, Sharan
    Matena, Michael
    Zhou, Yanqi
    Li, Wei
    Liu, Peter J.
    Journal of Machine Learning Research, 2020, 21