Lifting the Curse of Multilinguality by Pre-training Modular Transformers

被引:0
|
作者
Pfeiffer, Jonas [1 ,2 ,3 ]
Goyal, Naman [3 ]
Lin, Xi Victoria [3 ]
Li, Xian [3 ]
Cross, James [3 ]
Riedel, Sebastian [3 ]
Artetxe, Mikel [3 ]
机构
[1] NYU, New York, NY 10003 USA
[2] Tech Univ Darmstadt, Darmstadt, Germany
[3] Meta AI, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (XMOD) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
引用
收藏
页码:3479 / 3495
页数:17
相关论文
共 50 条
  • [1] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [2] Evaluation of FractalDB Pre-training with Vision Transformers
    Nakashima K.
    Kataoka H.
    Satoh Y.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2023, 89 (01): : 99 - 104
  • [3] Pre-training of Graph Augmented Transformers for Medication Recommendation
    Shang, Junyuan
    Ma, Tengfei
    Xiao, Cao
    Sun, Jimeng
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5953 - 5959
  • [4] Deep Pre-Training Transformers for Scientific Paper Representation
    Wang, Jihong
    Yang, Zhiguang
    Cheng, Zhanglin
    ELECTRONICS, 2024, 13 (11)
  • [5] Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers
    Xu, Shusheng
    Zhang, Xingxing
    Wu, Yi
    Wei, Furu
    Zhou, Ming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1784 - 1795
  • [6] Take a Closer Look at Multilinguality! Improve Multilingual Pre-Training Using Monolingual Corpora Only
    Lu, Jinliang
    Lu, Yu
    Zhang, Jiajun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2891 - 2907
  • [7] PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
    Dong, Xiaoyi
    Bao, Jianmin
    Zhang, Ting
    Chen, Dongdong
    Zhang, Weiming
    Yuan, Lu
    Chen, Dong
    Wen, Fang
    Yu, Nenghai
    Guo, Baining
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 552 - 560
  • [8] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Devlin, Jacob
    Chang, Ming-Wei
    Lee, Kenton
    Toutanova, Kristina
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4171 - 4186
  • [9] Pre-training Vision Transformers with Very Limited Synthesized Images
    Nakamura, Ryo
    Kataoka, Hirokatsu
    Takashima, Sora
    Noriega, Edgar Josafat Martinez
    Yokota, Rio
    Inoue, Nakamasa
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20303 - 20312
  • [10] Unsupervised pre-training of graph transformers on patient population graphs
    Pellegrini, Chantal
    Navab, Nassir
    Kazi, Anees
    MEDICAL IMAGE ANALYSIS, 2023, 89