Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

被引:0
|
作者
Xiao, Yubei [1 ]
Gong, Ke [3 ]
Zhou, Pan [4 ]
Zheng, Guolin [1 ]
Liang, Xiaodan [2 ,3 ]
Lin, Liang [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Guangzhou, Guangdong, Peoples R China
[3] Dark Matter AI Res, Abu Dhabi, U Arab Emirates
[4] SalesForce, San Francisco, CA USA
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different source languages, the quantity and difficulty vary greatly because of their different data scales and diverse phonological systems, which leads to task-quantity and task-difficulty imbalance issues and thus a failure of multilingual meta-learning ASR (MML-ASR). In this work, we solve this problem by developing a novel adversarial meta sampling (AMS) approach to improve MML-ASR. When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language. Specifically, for each source language, if the query loss is large, it means that its tasks are not well sampled to train ASR model in terms of its quantity and difficulty and thus should be sampled more frequently for extra learning. Inspired by this fact, we feed the historical task query loss of all source language domain into a network to learn a task sampling policy for adversarially increasing the current query loss of MML-ASR. Thus, the learnt task sampling policy can master the learning situation of each language and thus predicts good task sampling probability for each language for more effective learning. Finally, experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS to other low-resource speech tasks and transfer learning ASR approaches.
引用
收藏
页码:14112 / 14120
页数:9
相关论文
共 50 条
  • [1] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [2] Meta adversarial learning improves low-resource speech recognition
    Chen, Yaqi
    Yang, Xukui
    Zhang, Hao
    Zhang, Wenlin
    Qu, Dan
    Chen, Cong
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 84
  • [3] Task-based Meta Focal Loss for Multilingual Low-resource Speech Recognition
    Chen, Yaqi
    Zhang, Wenlin
    Zhang, Hao
    Qu, Dan
    Yang, Xu-Kui
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (11)
  • [4] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [5] Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2601 - 2604
  • [6] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
  • [7] Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition
    Zhou, Shiyu
    Zhao, Yuanyuan
    Xu, Shuang
    Xu, Bo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 704 - 708
  • [8] MULTILINGUAL MRASTA FEATURES FOR LOW-RESOURCE KEYWORD SEARCH AND SPEECH RECOGNITION SYSTEMS
    Tueske, Zoltan
    Nolden, David
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] META-LEARNING FOR LOW-RESOURCE SPEECH EMOTION RECOGNITION
    Chopra, Suransh
    Mathur, Puneet
    Sawhney, Ramit
    Shah, Rajiv Ratn
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6259 - 6263
  • [10] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368