Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition

被引:22
|
作者
Hou, Wenxin [1 ,2 ]
Zhu, Han [3 ]
Wang, Yidong [1 ]
Wang, Jindong [4 ]
Qin, Tao [4 ]
Xu, Renju [5 ]
Shinozaki, Takahiro [1 ]
机构
[1] Tokyo Inst Technol, Tokyo 1528550, Japan
[2] Microsoft, Suzhou 215123, Peoples R China
[3] Chinese Acad Sci, Inst Acoust, Beijing 100045, Peoples R China
[4] Microsoft Res Asia, Beijing 100080, Peoples R China
[5] Zhejiang Univ, Ctr Data Sci, Hangzhou 310027, Peoples R China
关键词
Adaptation models; Task analysis; Speech recognition; Transformers; Training; Training data; Data models; cross-lingual adaptation; meta-learning; parameter-efficiency;
D O I
10.1109/TASLP.2021.3138674
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily overfit. Adapter is a versatile module that can be plugged into Transformer for parameter-efficient learning. In this paper, we propose to use adapters for parameter-efficient cross-lingual speech adaptation. Based on our previous MetaAdapter that implicitly leverages adapters, we propose a novel algorithm called SimAdapter for explicitly learning knowledge from adapters. Our algorithms can be easily integrated into the Transformer structure. MetaAdapter leverages meta-learning to transfer the general knowledge from training data to the test language. SimAdapter aims to learn the similarities between the source and target languages during fine-tuning using the adapters. We conduct extensive experiments on five-low-resource languages in the Common Voice dataset. Results demonstrate that MetaAdapter and SimAdapter can reduce WER by 2.98% and 2.55% with only 2.5% and 15.5% of trainable parameters compared to the strong full-model fine-tuning baseline. Moreover, we show that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.
引用
收藏
页码:317 / 329
页数:13
相关论文
共 50 条
  • [1] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [2] Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2581 - 2584
  • [3] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
  • [4] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    [J]. 1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [5] SUBSPACE MIXTURE MODEL FOR LOW-RESOURCE SPEECH RECOGNITION IN CROSS-LINGUAL SETTINGS
    Miao, Yajie
    Metze, Florian
    Waibel, Alex
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7339 - 7343
  • [6] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    [J]. 1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [7] CAM: A cross-lingual adaptation framework for low-resource language speech recognition
    Hu, Qing
    Zhang, Yan
    Zhang, Xianlei
    Han, Zongyu
    Yu, Xilong
    [J]. INFORMATION FUSION, 2024, 111
  • [8] Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Boyd-Graber, Jordan
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9547 - 9554
  • [9] Improving cross-lingual low-resource speech recognition by Task-based Meta PolyLoss
    Chen, Yaqi
    Zhang, Hao
    Yang, Xukui
    Zhang, Wenlin
    Qu, Dan
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [10] Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition
    Zi-Qiang Zhang
    Yan Song
    Ming-Hui Wu
    Xin Fang
    Ian McLoughlin
    Li-Rong Dai
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 6827 - 6843