Denoising based Sequence-to-Sequence Pre-training for Text Generation

被引:0
|
作者
Wang, Liang [1 ]
Zhao, Wei [1 ]
Jia, Ruoyu [1 ]
Li, Sujian [2 ]
Liu, Jingming [1 ]
机构
[1] Yuanfudao AI Lab, Beijing, Peoples R China
[2] Peking Univ, Key Lab Computat Linguist, MOE, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new sequence-to-sequence (seq2seq) pre-training method PoDA (Pre-training of Denoising Autoencoders), which learns representations suitable for text generation tasks. Unlike encoder-only (e.g., BERT) or decoder-only (e.g., OpenAI GPT) pre-training approaches, PoDA jointly pretrains both the encoder and decoder by denoising the noise-corrupted text, and it also has the advantage of keeping the network architecture unchanged in the subsequent fine-tuning stage. Meanwhile, we design a hybrid model of Transformer and pointer-generator networks as the backbone architecture for PoDA. We conduct experiments on two text generation tasks: abstractive summarization, and grammatical error correction. Results on four datasets show that PoDA can improve model performance over strong baselines without using any task-specific techniques and significantly speed up convergence.(1)
引用
收藏
页码:4003 / 4015
页数:13
相关论文
共 50 条
  • [1] Improving AMR Parsing with Sequence-to-Sequence Pre-training
    Xu, Dongqin
    Li, Junhui
    Zhu, Muhua
    Min Zhang
    Zhou, Guodong
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2501 - 2511
  • [2] Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
    Zhou, Wangchunshu
    Ge, Tao
    Xu, Canwen
    Xu, Ke
    Wei, Furu
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 571 - 582
  • [3] MAPGN: MASKED POINTER-GENERATOR NETWORK FOR SEQUENCE-TO-SEQUENCE PRE-TRAINING
    Ihori, Mana
    Makishima, Naoki
    Tanaka, Tomohiro
    Takashima, Akihiko
    Orihashi, Shota
    Masumura, Ryo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7563 - 7567
  • [4] ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
    Qi, Weizhen
    Yan, Yu
    Gong, Yeyun
    Liu, Dayiheng
    Duan, Nan
    Chen, Jiusheng
    Zhang, Ruofei
    Zhou, Ming
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2401 - 2410
  • [5] DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
    Huang, Luyang
    Niu, Guocheng
    Liu, Jiachen
    Xiao, Xinyan
    Wu, Hua
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2552 - 2566
  • [6] MASS: Masked Sequence to Sequence Pre-training for Language Generation
    Song, Kaitao
    Tan, Xu
    Qin, Tao
    Lu, Jianfeng
    Liu, Tie-Yan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] Code Question Answering via Task-Adaptive Sequence-to-Sequence Pre-training
    Yu, Tingrui
    Gu, Xiaodong
    Shen, Beijun
    [J]. 2022 29TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC, 2022, : 229 - 238
  • [8] SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations
    Niu, Changan
    Li, Chuanyi
    Ng, Vincent
    Ge, Jidong
    Huang, Liguo
    Luo, Bin
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2006 - 2018
  • [9] Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
    Mueller, Aaron
    Frank, Robert
    Linzen, Tal
    Wang, Luheng
    Schuster, Sebastian
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1352 - 1368
  • [10] A Fuzzy Training Framework for Controllable Sequence-to-Sequence Generation
    Li, Jiajia
    Wang, Ping
    Li, Zuchao
    Liu, Xi
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Hai
    Ai, Haojun
    [J]. IEEE ACCESS, 2022, 10 : 92467 - 92480