Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition

被引:22
|
作者
Hou, Wenxin [1 ,2 ]
Zhu, Han [3 ]
Wang, Yidong [1 ]
Wang, Jindong [4 ]
Qin, Tao [4 ]
Xu, Renju [5 ]
Shinozaki, Takahiro [1 ]
机构
[1] Tokyo Inst Technol, Tokyo 1528550, Japan
[2] Microsoft, Suzhou 215123, Peoples R China
[3] Chinese Acad Sci, Inst Acoust, Beijing 100045, Peoples R China
[4] Microsoft Res Asia, Beijing 100080, Peoples R China
[5] Zhejiang Univ, Ctr Data Sci, Hangzhou 310027, Peoples R China
关键词
Adaptation models; Task analysis; Speech recognition; Transformers; Training; Training data; Data models; cross-lingual adaptation; meta-learning; parameter-efficiency;
D O I
10.1109/TASLP.2021.3138674
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily overfit. Adapter is a versatile module that can be plugged into Transformer for parameter-efficient learning. In this paper, we propose to use adapters for parameter-efficient cross-lingual speech adaptation. Based on our previous MetaAdapter that implicitly leverages adapters, we propose a novel algorithm called SimAdapter for explicitly learning knowledge from adapters. Our algorithms can be easily integrated into the Transformer structure. MetaAdapter leverages meta-learning to transfer the general knowledge from training data to the test language. SimAdapter aims to learn the similarities between the source and target languages during fine-tuning using the adapters. We conduct extensive experiments on five-low-resource languages in the Common Voice dataset. Results demonstrate that MetaAdapter and SimAdapter can reduce WER by 2.98% and 2.55% with only 2.5% and 15.5% of trainable parameters compared to the strong full-model fine-tuning baseline. Moreover, we show that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.
引用
收藏
页码:317 / 329
页数:13
相关论文
共 50 条
  • [31] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Zolzaya Byambadorj
    Ryota Nishimura
    Altangerel Ayush
    Kengo Ohta
    Norihide Kitaoka
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [32] Adversarial Cross-Lingual Transfer Learning for Slot Tagging of Low-Resource Languages
    He, Keqing
    Yan, Yuanmeng
    Xu, Weiran
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [33] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [34] Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD
    Taghizadeh, Nasrin
    Faili, Hesham
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 56 : 61 - 87
  • [35] Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
    Schlichtkrull, Michael Sejr
    Sogaard, Anders
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 220 - 229
  • [36] IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS
    Le Minh Nguyen
    Nayak, Shekhar
    Coler, Matt
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 792 - 797
  • [37] Unsupervised Stem-based Cross-lingual Part-of-Speech Tagging for Morphologically Rich Low-Resource Languages
    Eskander, Ramy
    Lowry, Cass
    Khandagale, Sujay
    Klavans, Judith
    Polinsky, Maria
    Muresan, Smaranda
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4061 - 4072
  • [38] Intent detection and slot filling for Persian: Cross-lingual training for low-resource languages
    Zadkamali, Reza
    Momtazi, Saeedeh
    Zeinali, Hossein
    [J]. NATURAL LANGUAGE PROCESSING, 2024,
  • [39] Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations
    Zhang, Rui
    Westerfield, Caitlin
    Shim, Sungrok
    Bingham, Garrett
    Fabbri, Alexander
    Hu, William
    Verma, Neha
    Radev, Dragomir
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3173 - 3179
  • [40] XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
    Abhishek, Tushar
    Sagare, Shivprasad
    Singh, Bhavyajeet
    Sharma, Anubhav
    Gupta, Manish
    Varma, Vasudeva
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 171 - 175