Lifting the Curse of Multilinguality by Pre-training Modular Transformers

被引:0
|
作者
Pfeiffer, Jonas [1 ,2 ,3 ]
Goyal, Naman [3 ]
Lin, Xi Victoria [3 ]
Li, Xian [3 ]
Cross, James [3 ]
Riedel, Sebastian [3 ]
Artetxe, Mikel [3 ]
机构
[1] NYU, New York, NY 10003 USA
[2] Tech Univ Darmstadt, Darmstadt, Germany
[3] Meta AI, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (XMOD) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
引用
收藏
页码:3479 / 3495
页数:17
相关论文
共 50 条
  • [41] Pre-training via Paraphrasing
    Lewis, Mike
    Ghazvininejad, Marjan
    Ghosh, Gargi
    Aghajanyan, Armen
    Wang, Sida
    Zettlemoyer, Luke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [42] THE PRE-TRAINING SELECTION OF TEACHERS
    Barr, A. S.
    Douglas, Lois
    JOURNAL OF EDUCATIONAL RESEARCH, 1934, 28 (02): : 92 - 117
  • [43] Improving Fractal Pre-training
    Anderson, Connor
    Farrell, Ryan
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2412 - 2421
  • [44] Pre-training phenotyping classifiers
    Dligach, Dmitriy
    Afshar, Majid
    Miller, Timothy
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113 (113)
  • [45] Rethinking Pre-training and Self-training
    Zoph, Barret
    Ghiasi, Golnaz
    Lin, Tsung-Yi
    Cui, Yin
    Liu, Hanxiao
    Cubuk, Ekin D.
    Le, Quoc V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [46] Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
    Tang, Yucheng
    Yang, Dong
    Li, Wenqi
    Roth, Holger R.
    Landman, Bennett
    Xu, Daguang
    Nath, Vishwesh
    Hatamizadeh, Ali
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20698 - 20708
  • [47] Trajectory-BERT: Pre-training and fine-tuning bidirectional transformers for crowd trajectory enhancement
    Li, Lingyu
    Huang, Tianyu
    Li, Yihao
    Li, Peng
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [48] An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-training
    Gao, Jin
    Lin, Shubo
    Wang, Shaoru
    Kou, Yutong
    Li, Zeming
    Li, Liang
    Zhang, Congxuan
    Zhang, Xiaoqin
    Wang, Yizheng
    Hu, Weiming
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [49] Pre-Training Without Natural Images
    Kataoka, Hirokatsu
    Okayasu, Kazushige
    Matsumoto, Asato
    Yamagata, Eisuke
    Yamada, Ryosuke
    Inoue, Nakamasa
    Nakamura, Akio
    Satoh, Yutaka
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (04) : 990 - 1007
  • [50] Dialogue-oriented Pre-training
    Xu, Yi
    Zhao, Hai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2663 - 2673