Language fusion via adapters for low-resource speech recognition

被引:1
|
作者
Hu, Qing [1 ]
Zhang, Yan [1 ]
Zhang, Xianlei [1 ]
Han, Zongyu [1 ]
Liang, Xiuxia
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence & Data Sci, Tianjin 300401, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Low-resource languages; Language fusion; Adapter-tuning;
D O I
10.1016/j.specom.2024.103037
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Data scarcity makes low -resource speech recognition systems suffer from severe overfitting. Although finetuning addresses this issue to some extent, it leads to parameter -inefficient training. In this paper, a novel language knowledge fusion method, named LanFusion, is proposed. It is built on the recent popular adaptertuning technique, thus maintaining better parameter efficiency compared with conventional fine-tuning methods. LanFusion is a two -stage method. Specifically, multiple adapters are first trained on several source languages to extract language -specific and language -invariant knowledge. Then, the trained adapters are retrained on the target low -resource language to fuse the learned knowledge. Compared with Vanilla -adapter, LanFusion obtains a relative average word error rate (WER) reduction of 9.8% and 8.6% on the Common Voice and FLEURS corpora, respectively. Extensive experiments demonstrate the proposed method is not only simple and effective but also parameter -efficient. Besides, using source languages that are geographically similar to the target language yields better results on both datasets.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formosa Speech Recognition Challenges
    Lin, Hung-Pang
    Zhang, Yu-Jia
    Chen, Chia-Ping
    [J]. INTERSPEECH 2021, 2021, : 4339 - 4343
  • [22] Convolutional Maxout Neural Networks for Low-Resource Speech Recognition
    Cai, Meng
    Shi, Yongzhe
    Kang, Jian
    Liu, Jia
    Su, Tengrong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 133 - +
  • [23] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    [J]. 2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [24] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [25] Weighted Gradient Pretrain for Low-Resource Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Zhao, Xiaoyan
    Liang, Zhenlin
    Du, Jing
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (07) : 1352 - 1355
  • [26] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
    Meng, Linghui
    Xu, Jin
    Tan, Xu
    Wang, Jindong
    Qin, Tao
    Xu, Bo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
  • [27] Meta adversarial learning improves low-resource speech recognition
    Chen, Yaqi
    Yang, Xukui
    Zhang, Hao
    Zhang, Wenlin
    Qu, Dan
    Chen, Cong
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 84
  • [28] EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
    Zhou, Zhikai
    Wang, Wei
    Zhang, Wangyou
    Qian, Yanmin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8192 - 8196
  • [29] META-LEARNING FOR LOW-RESOURCE SPEECH EMOTION RECOGNITION
    Chopra, Suransh
    Mathur, Puneet
    Sawhney, Ramit
    Shah, Rajiv Ratn
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6259 - 6263
  • [30] STOCHASTIC POOLING MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Cai, Meng
    Shi, Yongzhe
    Liu, Jia
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,