Adapting Pretrained Text-to-Text Models for Long Text Sequences

被引：0

作者：

Xiong, Wenhan ^{[1
]}

Gupta, Anchit ^{[1
]}

Toshniwal, Shubham ^{[1
]}

Mehdad, Yashar ^{[1
]}

Yih, Wen-tau ^{[1
]}

机构：

[1] Meta AI, Menlo Pk, CA 94025 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline - model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in transformers with pooling-augmented blockwise attention, and pretrain the model with a masked-span prediction task with spans of varying lengths. In terms of the pretraining corpus, we find that using randomly concatenated short-documents from a large open-domain corpus results in better performance than using existing long document corpora, which are typically limited in their domain coverage. With these findings, we build a long-context model that achieves competitive performance on long-text QA tasks and establishes the new state of the art on five long-text summarization datasets, often outperforming previous methods with larger model sizes.

引用

页码：5566 / 5578

页数：13

共 50 条

[21] Evaluation of Transfer Learning for Polish with a Text-to-Text Model
Chrabrowa, Aleksandra
Dragan, Lukasz
Grzegorczyk, Karol
Kajtoch, Dariusz
Koszowski, Mikolaj
Mroczkowski, Robert
Rybak, Piotr
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4374 - 4394
[22] Text-to-Text Transfer Transformer Phrasing Model Using Enriched Text Input
Rezackova, Marketa
Matousek, Jindrich
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 389 - 400
[23] Product Titles-to-Attributes As a Text-to-Text Task
Fuchs, Gilad
Acriche, Yoni
PROCEEDINGS OF THE 5TH WORKSHOP ON E-COMMERCE AND NLP (ECNLP 5), 2022, : 91 - 98
[24] A Text-to-Text Model for Multilingual Offensive Language Identification
Ranasinghe, Tharindu
Zampieri, Marcos
13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 375 - 384
[25] Pretrained Language Models for Text Generation: A Survey
Li, Junyi
Tang, Tianyi
Zhao, Wayne Xin
Wen, Ji-Rong
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4492 - 4499
[26] Text-to-text generative approach for enhanced complex word identification
Sliwiak, Patrycja
Shah, Syed Afaq Ali
NEUROCOMPUTING, 2024, 610
[27] TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Mahabadi, Rabeeh Karimi
Ivison, Hamish
Tae, Jaesung
Henderson, James
Beltagy, Iz
Peters, Matthew E.
Cohan, Arman
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2347 - 2361
[28] Text-to-text machine translation using the RECONTRA connectionist model
Castaño, MA
Casacuberta, F
ENGINEERING APPLICATIONS OF BIO-INSPIRED ARTIFICIAL NEURAL NETWORKS, VOL II, 1999, 1607 : 683 - 692
[29] TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
Berabi, Berkay
He, Jingxuan
Raychev, Veselin
Vechev, Martin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[30] Exploring the limits of transfer learning with a unified text-to-text transformer
Raffel, Colin
Shazeer, Noam
Roberts, Adam
Lee, Katherine
Narang, Sharan
Matena, Michael
Zhou, Yanqi
Li, Wei
Liu, Peter J.
Journal of Machine Learning Research, 2020, 21

← 1 2 3 4 5 →