Pre-training via Paraphrasing

被引:0
|
作者
Lewis, Mike [1 ]
Ghazvininejad, Marjan [1 ]
Ghosh, Gargi [1 ]
Aghajanyan, Armen [1 ]
Wang, Sida [1 ]
Zettlemoyer, Luke [1 ]
机构
[1] Facebook AI, New York, NY 10001 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization. The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks. For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation. We further show that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Improved OOD Generalization via Adversarial Training and Pre-training
    Yi, Mingyangi
    Hou, Lu
    Sun, Jiacheng
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Ma, Zhi-Ming
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] Insights into Pre-training via Simpler Synthetic Tasks
    Wu, Yuhuai
    Li, Felix
    Liang, Percy
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Learning Visual Prior via Generative Pre-Training
    Xie, Jinheng
    Ye, Kai
    Li, Yudong
    Li, Yuexiang
    Lin, Kevin Qinghong
    Zheng, Yefeng
    Shen, Linlin
    Shou, Mike Zheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective
    Xing, Yue
    Lin, Xiaofeng
    Song, Qifan
    Xu, Yi
    Zeng, Belinda
    Cheng, Guang
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [5] Scaling Language-Image Pre-training via Masking
    Li, Yanghao
    Fan, Haoqi
    Hu, Ronghang
    Feichtenhofert, Christoph
    He, Kaiming
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23390 - 23400
  • [6] Multilingual Molecular Representation Learning via Contrastive Pre-training
    Guo, Zhihui
    Sharma, Pramod
    Martinez, Andy
    Du, Liang
    Abraham, Robin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3441 - 3453
  • [7] TAPAS: Weakly Supervised Table Parsing via Pre-training
    Herzig, Jonathan
    Nowak, Pawel Krzysztof
    Mueller, Thomas
    Piccinno, Francesco
    Eisenschlos, Julian Martin
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4320 - 4333
  • [8] Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect
    Zeng, Zheni
    Xiao, Chaojun
    Yao, Yuan
    Xie, Ruobing
    Liu, Zhiyuan
    Lin, Fen
    Lin, Leyu
    Sun, Maosong
    [J]. FRONTIERS IN BIG DATA, 2021, 4
  • [9] Pre-training of Recurrent Neural Networks via Linear Autoencoders
    Pasa, Luca
    Sperduti, Alessandro
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [10] Vision-language pre-training via modal interaction
    Cheng, Hang
    Ye, Hehui
    Zhou, Xiaofei
    Liu, Ximeng
    Chen, Fei
    Wang, Meiqing
    [J]. PATTERN RECOGNITION, 2024, 156