Language fusion via adapters for low-resource speech recognition

被引:1
|
作者
Hu, Qing [1 ]
Zhang, Yan [1 ]
Zhang, Xianlei [1 ]
Han, Zongyu [1 ]
Liang, Xiuxia
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence & Data Sci, Tianjin 300401, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Low-resource languages; Language fusion; Adapter-tuning;
D O I
10.1016/j.specom.2024.103037
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Data scarcity makes low -resource speech recognition systems suffer from severe overfitting. Although finetuning addresses this issue to some extent, it leads to parameter -inefficient training. In this paper, a novel language knowledge fusion method, named LanFusion, is proposed. It is built on the recent popular adaptertuning technique, thus maintaining better parameter efficiency compared with conventional fine-tuning methods. LanFusion is a two -stage method. Specifically, multiple adapters are first trained on several source languages to extract language -specific and language -invariant knowledge. Then, the trained adapters are retrained on the target low -resource language to fuse the learned knowledge. Compared with Vanilla -adapter, LanFusion obtains a relative average word error rate (WER) reduction of 9.8% and 8.6% on the Common Voice and FLEURS corpora, respectively. Extensive experiments demonstrate the proposed method is not only simple and effective but also parameter -efficient. Besides, using source languages that are geographically similar to the target language yields better results on both datasets.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
    Hou, Wenxin
    Zhu, Han
    Wang, Yidong
    Wang, Jindong
    Qin, Tao
    Xu, Renju
    Shinozaki, Takahiro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 317 - 329
  • [2] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [3] A General Procedure for Improving Language Models in Low-Resource Speech Recognition
    Liu, Qian
    Zhang, Wei-Qiang
    Liu, Jia
    Liu, Yao
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 428 - 433
  • [4] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
  • [5] Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers
    Reitmaier, Thomas
    Wallington, Electra
    Raju, Dani Kalarikalayil
    Klejch, Ondrej
    Pearson, Jennifer
    Jones, Matt
    Bell, Peter
    Robinson, Simon
    [J]. PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [6] CURRICULUM OPTIMIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
    Kuznetsova, Anastasia
    Kumar, Anurag
    Fox, Jennifer Drexler
    Tyers, Francis M.
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8187 - 8191
  • [7] Investigate Automatic Speech Recognition and Keyword Search for Very Low-Resource Language
    Ni, Chongjia
    Ma, Bin
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 336 - 340
  • [8] Enrollment in low-resource speech recognition systems
    Deligne, S
    Dharanipragada, S
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 341 - 344
  • [9] CAM: A cross-lingual adaptation framework for low-resource language speech recognition
    Hu, Qing
    Zhang, Yan
    Zhang, Xianlei
    Han, Zongyu
    Yu, Xilong
    [J]. INFORMATION FUSION, 2024, 111
  • [10] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403