Language fusion via adapters for low-resource speech recognition

被引:1
|
作者
Hu, Qing [1 ]
Zhang, Yan [1 ]
Zhang, Xianlei [1 ]
Han, Zongyu [1 ]
Liang, Xiuxia
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence & Data Sci, Tianjin 300401, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Low-resource languages; Language fusion; Adapter-tuning;
D O I
10.1016/j.specom.2024.103037
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Data scarcity makes low -resource speech recognition systems suffer from severe overfitting. Although finetuning addresses this issue to some extent, it leads to parameter -inefficient training. In this paper, a novel language knowledge fusion method, named LanFusion, is proposed. It is built on the recent popular adaptertuning technique, thus maintaining better parameter efficiency compared with conventional fine-tuning methods. LanFusion is a two -stage method. Specifically, multiple adapters are first trained on several source languages to extract language -specific and language -invariant knowledge. Then, the trained adapters are retrained on the target low -resource language to fuse the learned knowledge. Compared with Vanilla -adapter, LanFusion obtains a relative average word error rate (WER) reduction of 9.8% and 8.6% on the Common Voice and FLEURS corpora, respectively. Extensive experiments demonstrate the proposed method is not only simple and effective but also parameter -efficient. Besides, using source languages that are geographically similar to the target language yields better results on both datasets.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition
    Chen, Dongpeng
    Mak, Brian Kan-Wing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1172 - 1183
  • [42] Exploring End-to-End Techniques for Low-Resource Speech Recognition
    Bataev, Vladimir
    Korenevsky, Maxim
    Medennikov, Ivan
    Zatvornitskiy, Alexander
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 32 - 41
  • [43] Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2601 - 2604
  • [44] META LEARNING FOR END-TO-END LOW-RESOURCE SPEECH RECOGNITION
    Hsu, Jui-Yang
    Chen, Yuan-Jui
    Lee, Hung-yi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7844 - 7848
  • [45] Cross-Language End-to-End Speech Recognition Research Based on Transfer Learning for the Low-Resource Tujia Language
    Yu, Chongchong
    Chen, Yunbing
    Li, Yueqiao
    Kang, Meng
    Xu, Shixuan
    Liu, Xueer
    [J]. SYMMETRY-BASEL, 2019, 11 (02):
  • [46] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368
  • [47] Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition
    Xie, Chuandong
    Guo, Wu
    Hu, Guoping
    Liu, Junhua
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1340 - 1344
  • [48] END-TO-END SPEECH RECOGNITION AND KEYWORD SEARCH ON LOW-RESOURCE LANGUAGES
    Rosenberg, Andrew
    Audhkhasi, Kartik
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Picheny, Michael
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5280 - 5284
  • [49] DOMAIN ADAPTATION OF END-TO-END SPEECH RECOGNITION IN LOW-RESOURCE SETTINGS
    Samarakoon, Lahiru
    Mak, Brian
    Lam, Albert Y. S.
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 382 - 388
  • [50] Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition
    Yi, Cheng
    Zhou, Shiyu
    Xu, Bo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 788 - 792