Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition

被引:33
|
作者
Hou, Wenxin [1 ,2 ]
Zhu, Han [3 ]
Wang, Yidong [1 ]
Wang, Jindong [4 ]
Qin, Tao [4 ]
Xu, Renju [5 ]
Shinozaki, Takahiro [1 ]
机构
[1] Tokyo Inst Technol, Tokyo 1528550, Japan
[2] Microsoft, Suzhou 215123, Peoples R China
[3] Chinese Acad Sci, Inst Acoust, Beijing 100045, Peoples R China
[4] Microsoft Res Asia, Beijing 100080, Peoples R China
[5] Zhejiang Univ, Ctr Data Sci, Hangzhou 310027, Peoples R China
关键词
Adaptation models; Task analysis; Speech recognition; Transformers; Training; Training data; Data models; cross-lingual adaptation; meta-learning; parameter-efficiency;
D O I
10.1109/TASLP.2021.3138674
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily overfit. Adapter is a versatile module that can be plugged into Transformer for parameter-efficient learning. In this paper, we propose to use adapters for parameter-efficient cross-lingual speech adaptation. Based on our previous MetaAdapter that implicitly leverages adapters, we propose a novel algorithm called SimAdapter for explicitly learning knowledge from adapters. Our algorithms can be easily integrated into the Transformer structure. MetaAdapter leverages meta-learning to transfer the general knowledge from training data to the test language. SimAdapter aims to learn the similarities between the source and target languages during fine-tuning using the adapters. We conduct extensive experiments on five-low-resource languages in the Common Voice dataset. Results demonstrate that MetaAdapter and SimAdapter can reduce WER by 2.98% and 2.55% with only 2.5% and 15.5% of trainable parameters compared to the strong full-model fine-tuning baseline. Moreover, we show that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.
引用
收藏
页码:317 / 329
页数:13
相关论文
共 50 条
  • [41] Adversarial Cross-Lingual Transfer Learning for Slot Tagging of Low-Resource Languages
    He, Keqing
    Yan, Yuanmeng
    Xu, Weiran
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [42] Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD
    Taghizadeh, Nasrin
    Faili, Hesham
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 56 : 61 - 87
  • [43] Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning
    Agrawal, Ashish Sunil
    Fazili, Barah
    Jyothi, Preethi
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 319 - 329
  • [44] Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
    Schlichtkrull, Michael Sejr
    Sogaard, Anders
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 220 - 229
  • [45] Is Translation Helpful? An Exploration of Cross-Lingual Transfer in Low-Resource Dialog Generation
    Shen, Lei
    Yu, Shuai
    Shen, Xiaoyu
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [46] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248
  • [47] IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS
    Le Minh Nguyen
    Nayak, Shekhar
    Coler, Matt
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 792 - 797
  • [48] Unsupervised Stem-based Cross-lingual Part-of-Speech Tagging for Morphologically Rich Low-Resource Languages
    Eskander, Ramy
    Lowry, Cass
    Khandagale, Sujay
    Klavans, Judith
    Polinsky, Maria
    Muresan, Smaranda
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4061 - 4072
  • [49] Intent detection and slot filling for Persian: Cross-lingual training for low-resource languages
    Zadkamali, Reza
    Momtazi, Saeedeh
    Zeinali, Hossein
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 559 - 574
  • [50] Augmenting Low-Resource Cross-Lingual Summarization with Progression-Grounded Training and Prompting
    Ma, Jiu Shun
    Huang, Yuxin
    Wang, Linqin
    Huang, Xiang
    Peng, Hao
    Yu, Zhengtao
    Yu, Philip
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (09)