MASS: Masked Sequence to Sequence Pre-training for Language Generation

被引:0
|
作者
Song, Kaitao [1 ]
Tan, Xu [2 ]
Qin, Tao [2 ]
Lu, Jianfeng [1 ]
Liu, Tie-Yan [2 ]
机构
[1] Nanjing Univ Sci & Technol, Key Lab Intelligent Percept & Syst High Dimens In, Minist Educ, Nanjing, Peoples R China
[2] Microsoft Res, Cambridge, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training and fine-tuning, e.g., BERT (Devlin et al., 2018), have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for encoder-decoder based language generation. MASS adopts the encoder-decoder framework to reconstruct a sentence fragment given the remaining part of the sentence: its encoder takes a sentence with randomly masked fragment (several consecutive tokens) as input, and its decoder tries to predict this masked fragment. In this way, MASS can jointly train the encoder and decoder to develop the capability of representation extraction and language modeling. By further fine-tuning on a variety of zero/low-resource language generation tasks, including neural machine translation, text summarization and conversational response generation (3 tasks and totally 8 datasets), MASS achieves significant improvements over baselines without pre-training or with other pre-training methods. Specially, we achieve state-of-the-art accuracy (37.5 in terms of BLEU score) on the unsupervised English-French translation, even beating the early attention-based supervised model (Bandanau et al., 2015b)(1).
引用
收藏
页数:11
相关论文
共 50 条
  • [1] MAPGN: MASKED POINTER-GENERATOR NETWORK FOR SEQUENCE-TO-SEQUENCE PRE-TRAINING
    Ihori, Mana
    Makishima, Naoki
    Tanaka, Tomohiro
    Takashima, Akihiko
    Orihashi, Shota
    Masumura, Ryo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7563 - 7567
  • [2] Denoising based Sequence-to-Sequence Pre-training for Text Generation
    Wang, Liang
    Zhao, Wei
    Jia, Ruoyu
    Li, Sujian
    Liu, Jingming
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4003 - 4015
  • [3] SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
    Yan, Hong
    Liu, Yang
    Wei, Yushen
    Li, Zhen
    Li, Guanbin
    Lin, Liang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5583 - 5595
  • [4] DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
    Huang, Luyang
    Niu, Guocheng
    Liu, Jiachen
    Xiao, Xinyan
    Wu, Hua
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2552 - 2566
  • [5] Improving AMR Parsing with Sequence-to-Sequence Pre-training
    Xu, Dongqin
    Li, Junhui
    Zhu, Muhua
    Min Zhang
    Zhou, Guodong
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2501 - 2511
  • [6] MPNet: Masked and Permuted Pre-training for Language Understanding
    Song, Kaitao
    Tan, Xu
    Qin, Tao
    Lu, Jianfeng
    Liu, Tie-Yan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
    Zhou, Wangchunshu
    Ge, Tao
    Xu, Canwen
    Xu, Ke
    Wei, Furu
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 571 - 582
  • [8] Unifying Event Detection and Captioning as Sequence Generation via Pre-training
    Zhang, Qi
    Song, Yuqing
    Jin, Qin
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 363 - 379
  • [9] TWO-STAGE PRE-TRAINING FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION
    Fan, Zhiyun
    Zhou, Shiyu
    Xu, Bo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
    Chapuis, Emile
    Colombo, Pierre
    Manica, Matteo
    Labeau, Matthieu
    Clavel, Chloe
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2636 - 2648