Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition

被引:22
|
作者
Hou, Wenxin [1 ,2 ]
Zhu, Han [3 ]
Wang, Yidong [1 ]
Wang, Jindong [4 ]
Qin, Tao [4 ]
Xu, Renju [5 ]
Shinozaki, Takahiro [1 ]
机构
[1] Tokyo Inst Technol, Tokyo 1528550, Japan
[2] Microsoft, Suzhou 215123, Peoples R China
[3] Chinese Acad Sci, Inst Acoust, Beijing 100045, Peoples R China
[4] Microsoft Res Asia, Beijing 100080, Peoples R China
[5] Zhejiang Univ, Ctr Data Sci, Hangzhou 310027, Peoples R China
关键词
Adaptation models; Task analysis; Speech recognition; Transformers; Training; Training data; Data models; cross-lingual adaptation; meta-learning; parameter-efficiency;
D O I
10.1109/TASLP.2021.3138674
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cross-lingual speech adaptation aims to solve the problem of leveraging multiple rich-resource languages to build models for a low-resource target language. Since the low-resource language has limited training data, speech recognition models can easily overfit. Adapter is a versatile module that can be plugged into Transformer for parameter-efficient learning. In this paper, we propose to use adapters for parameter-efficient cross-lingual speech adaptation. Based on our previous MetaAdapter that implicitly leverages adapters, we propose a novel algorithm called SimAdapter for explicitly learning knowledge from adapters. Our algorithms can be easily integrated into the Transformer structure. MetaAdapter leverages meta-learning to transfer the general knowledge from training data to the test language. SimAdapter aims to learn the similarities between the source and target languages during fine-tuning using the adapters. We conduct extensive experiments on five-low-resource languages in the Common Voice dataset. Results demonstrate that MetaAdapter and SimAdapter can reduce WER by 2.98% and 2.55% with only 2.5% and 15.5% of trainable parameters compared to the strong full-model fine-tuning baseline. Moreover, we show that these two novel algorithms can be integrated for better performance with up to 3.55% relative WER reduction.
引用
收藏
页码:317 / 329
页数:13
相关论文
共 50 条
  • [41] Cross-Lingual Transfer with Language-Specific Subnetworks for Low-Resource Dependency Parsing
    Choenni, Rochelle
    Garrette, Dan
    Shutova, Ekaterina
    [J]. COMPUTATIONAL LINGUISTICS, 2023, 49 (03) : 613 - 641
  • [42] CLIoS: Cross-lingual Induction of Speech Recognition Grammars
    Perera, Nadine
    Pitz, Michael
    Pinkal, Manfred
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2487 - 2494
  • [43] Unsupervised Cross-lingual Representation Learning for Speech Recognition
    Conneau, Alexis
    Baevski, Alexei
    Collobert, Ronan
    Mohamed, Abdelrahman
    Auli, Michael
    [J]. INTERSPEECH 2021, 2021, : 2426 - 2430
  • [44] Cross-Lingual Contrastive Learning for Fine-Grained Entity Typing for Low-Resource Languages
    Han, Xu
    Luo, Yuqi
    Chen, Weize
    Liu, Zhiyuan
    Sun, Maosong
    Zhou, Botong
    Hao, Fei
    Zheng, Suncong
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2241 - 2250
  • [45] Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging
    Huang, Lifu
    Ji, Heng
    May, Jonathan
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3823 - 3833
  • [46] C2LIR: Continual Cross-Lingual Transfer for Low-Resource Information Retrieval
    Lee, Jaeseong
    Lee, Dohyeon
    Kim, Jongho
    Hwang, Seung-Won
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 466 - 474
  • [47] Cross-lingual Sentence Embedding for Low-resource Chinese-Vietnamese Based on Contrastive Learning
    Huang, Yuxin
    Liang, Yin
    Wu, Zhaoyuan
    Zhu, Enchang
    Yu, Zhengtao
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [48] Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce
    Li, Juntao
    Liu, Chang
    Wang, Jian
    Bing, Lidong
    Li, Hongsong
    Liu, Xiaozhong
    Zhao, Dongyan
    Yan, Rui
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8212 - 8219
  • [49] A two-stage fine-tuning method for low-resource cross-lingual summarization
    Zhang, Kaixiong
    Zhang, Yongbing
    Yu, Zhengtao
    Huang, Yuxin
    Tan, Kaiwen
    [J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 1125 - 1143
  • [50] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
    Neumann, Michael
    Ngoc Thang Vu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773