Lifting the Curse of Multilinguality by Pre-training Modular Transformers

被引:0
|
作者
Pfeiffer, Jonas [1 ,2 ,3 ]
Goyal, Naman [3 ]
Lin, Xi Victoria [3 ]
Li, Xian [3 ]
Cross, James [3 ]
Riedel, Sebastian [3 ]
Artetxe, Mikel [3 ]
机构
[1] NYU, New York, NY 10003 USA
[2] Tech Univ Darmstadt, Darmstadt, Germany
[3] Meta AI, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (XMOD) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
引用
收藏
页码:3479 / 3495
页数:17
相关论文
共 50 条
  • [21] Automating Code-Related Tasks Through Transformers: The Impact of Pre-training
    Tufano, Rosalia
    Pascarella, Luca
    Bavota, Gabriele
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2425 - 2437
  • [22] HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
    Zhang, Xingxing
    Wei, Furu
    Zhou, Ming
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5059 - 5069
  • [23] SslTransT: Self-supervised pre-training visual object tracking with Transformers
    Cai, Yannan
    Tan, Ke
    Wei, Zhenzhong
    OPTICS COMMUNICATIONS, 2024, 557
  • [24] Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers
    Bary, Tim
    Macq, Benoit
    2024 IEEE 22ND MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, MELECON 2024, 2024, : 25 - 30
  • [25] TUTA: Tree-based Transformers for Generally Structured Table Pre-training
    Wang, Zhiruo
    Dong, Haoyu
    Jia, Ran
    Li, Jia
    Fu, Zhiyi
    Han, Shi
    Zhang, Dongmei
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1780 - 1790
  • [26] Improving Log-Based Anomaly Detection by Pre-Training Hierarchical Transformers
    Huang, Shaohan
    Liu, Yi
    Fung, Carol
    Wang, He
    Yang, Hailong
    Luan, Zhongzhi
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (09) : 2656 - 2667
  • [27] SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
    Cong, Yezhen
    Khanna, Samar
    Meng, Chenlin
    Liu, Patrick
    Rozi, Erik
    He, Yutong
    Burke, Marshall
    Lobell, David B.
    Ermon, Stefano
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries
    Liu, Xiao
    Zhao, Shiyu
    Su, Kai
    Cen, Yukuo
    Qiu, Jiezhong
    Zhang, Mengdi
    Wu, Wei
    Dong, Yuxiao
    Tang, Jie
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1120 - 1130
  • [29] Beyond masking: Demystifying token-based pre-training for vision transformers
    Tian, Yunjie
    Xie, Lingxi
    Fang, Jiemin
    Jiao, Jianbin
    Tian, Qi
    PATTERN RECOGNITION, 2025, 162
  • [30] VICTOR: Visual incompatibility detection with transformers and fashion-specific contrastive pre-training
    Papadopoulos, Stefanos-Iordanis
    Koutlis, Christos
    Papadopoulos, Symeon
    Kompatsiaris, Ioannis
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 90