Lifting the Curse of Multilinguality by Pre-training Modular Transformers

被引:0
|
作者
Pfeiffer, Jonas [1 ,2 ,3 ]
Goyal, Naman [3 ]
Lin, Xi Victoria [3 ]
Li, Xian [3 ]
Cross, James [3 ]
Riedel, Sebastian [3 ]
Artetxe, Mikel [3 ]
机构
[1] NYU, New York, NY 10003 USA
[2] Tech Univ Darmstadt, Darmstadt, Germany
[3] Meta AI, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (XMOD) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
引用
收藏
页码:3479 / 3495
页数:17
相关论文
共 50 条
  • [31] Evolving Deep Architectures: A New Blend of CNNs and Transformers Without Pre-training Dependencies
    Kiiskila, Manu
    Kiiskila, Padmasheela
    DEEP LEARNING THEORY AND APPLICATIONS, PT I, DELTA 2024, 2024, 2171 : 163 - 175
  • [32] CvFormer: Cross-view transFormers with pre-training for fMRI analysis of human brain
    Meng, Xiangzhu
    Wei, Wei
    Liu, Qiang
    Wang, Yu
    Li, Min
    Wang, Liang
    PATTERN RECOGNITION LETTERS, 2024, 186 : 85 - 90
  • [33] Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning
    Sun, Tianxiang
    He, Zhengfu
    Zhu, Qin
    Qiu, Xipeng
    Huang, Xuanjing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11156 - 11172
  • [34] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
    Liu, Tongtong
    Feng, Fangxiang
    Wang, Xiaojie
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
  • [35] Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
    Dong, Haoyu
    Cheng, Zhoujun
    He, Xinyi
    Zhou, Mengyu
    Zhou, Anda
    Zhou, Fan
    Liu, Ao
    Han, Shi
    Zhang, Dongmei
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 5426 - 5435
  • [36] ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification
    Lin, Xinjie
    Gang Xiong
    Gou, Gaopeng
    Zhen Li
    Shi, Junzheng
    Jing Yu
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 633 - 642
  • [37] Rethinking ImageNet Pre-training
    He, Kaiming
    Girshick, Ross
    Dollar, Piotr
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4917 - 4926
  • [38] Photo Pre-Training, But for Sketch
    Ke, L.
    Pang, Kaiyue
    Song, Yi-Zhe
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2754 - 2764
  • [39] Pre-Training to Learn in Context
    Gu, Yuxian
    Dong, Li
    Wei, Furu
    Huang, Minlie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4849 - 4870
  • [40] mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations
    Pfeiffer, Jonas
    Piccinno, Francesco
    Nicosia, Massimo
    Wang, Xinyi
    Reid, Machel
    Ruder, Sebastian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1978 - 2008