Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition

被引:0
|
作者
Pineiro-Martin, Andres [1 ,2 ]
Garcia-Mateo, Carmen [1 ]
Docio-Fernandez, Laura [1 ]
Del Carmen Lopez-Perez, Maria [2 ]
Rehm, Georg [3 ]
机构
[1] Univ Vigo, AtlanTTic Res Ctr, GTM Res Grp, Vigo, Spain
[2] Balidea Consulting & Programming SL, Santiago De Compostela, Spain
[3] DFKI GmbH, Speech & Language Technol Lab, Berlin, Germany
来源
关键词
Continual multilingual learning; automatic speech recognition; weighted cross-entropy; low-resource language; DATA AUGMENTATION;
D O I
10.21437/Interspeech.2024-734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenge of integrating low-resource languages into multilingual automatic speech recognition (ASR) systems. We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning. We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language, employing language-weighted dynamic cross-entropy and data augmentation. The results show a remarkable 6.69% word error rate (WER) reduction for the low-resource language compared to the fine-tuned model without applying our approach, and a 48.86% WER reduction compared to the original Whisper model. In addition, our approach yields an average WER reduction of 3.29% across the six languages, showing no degradation for the high-resource languages.
引用
收藏
页码:1235 / 1239
页数:5
相关论文
共 50 条
  • [1] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [2] Speech recognition datasets for low-resource Congolese languages
    Kimanuka, Ussen
    Maina, Ciira wa
    Buyuk, Osman
    DATA IN BRIEF, 2024, 52
  • [3] LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES
    Yu, Quanjie
    Liu, Peng
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lianhong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5545 - 5549
  • [4] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [5] Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
    Xiao, Yubei
    Gong, Ke
    Zhou, Pan
    Zheng, Guolin
    Liang, Xiaodan
    Lin, Liang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14112 - 14120
  • [6] CML-TTS: A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
    Oliveira, Frederico S.
    Casanova, Edresson
    Candido, Arnaldo, Jr.
    Soares, Anderson S.
    Galva Filho, Arlindo R.
    TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 188 - 199
  • [7] Efficient neural speech synthesis for low-resource languages through multilingual modeling
    de Korte, Marcel
    Kim, Jaebok
    Klabbers, Esther
    INTERSPEECH 2020, 2020, : 2967 - 2971
  • [8] Extending Multilingual BERT to Low-Resource Languages
    Wang, Zihan
    Karthikeyan, K.
    Mayhew, Stephen
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2649 - 2656
  • [9] Multilingual Meta-Transfer Learning for Low-Resource Speech Recognition
    Zhou, Rui
    Koshikawa, Takaki
    Ito, Akinori
    Nose, Takashi
    Chen, Chia-Ping
    IEEE ACCESS, 2024, 12 : 158493 - 158504
  • [10] Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2601 - 2604