Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech

被引:3
|
作者
Ghorbani, Shahram [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst, Richardson, TX 75080 USA
关键词
Adaptation models; Computational modeling; Training; Speech processing; Data models; Task analysis; Training data; Accented speech; continual learning; domain expansion; end-to-end systems; model adaptation; speech re- cognition; ADAPTATION;
D O I
10.1109/TASLP.2022.3233238
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Training Automatic Speech Recognition (ASR) systems with sequentially incoming data from alternate domains is an essential milestone in order to reach human intelligibility level in speech recognition. The main challenge of sequential learning is that current adaptation techniques result in significant performance degradation for previously-seen domains. To mitigate the catastrophic forgetting problem, this study proposes effective domain expansion techniques for two scenarios: 1) where only new domain data is available, and 2) where both prior and new domain data are available. We examine the efficacy of the approaches through experiments on adapting a model trained with native English to different English accents. For the first scenario, we study several existing and proposed regularization-based approaches to mitigate performance loss of initial data. The experiments demonstrate the superior performance of our proposed Soft KL-Divergence (SKLD)-Model Averaging (MA) approach. In this approach, SKLD first alleviates the forgetting problem during adaptation; next, MA makes the final efficient compromise between the two domains by averaging parameters of the initial and adapted models. For the second scenario, we explore several rehearsal-based approaches, which leverage initial data to maintain the original model performance. We propose Gradient Averaging (GA) as well as an approach which operates by averaging gradients computed for both initial and new domains. Experiments demonstrate that GA outperforms retraining and specifically designed continual learning approaches, such as Averaged Gradient Episodic Memory (AGEM). Moreover, GA significantly improves computational costs over the complete retraining approach.
引用
收藏
页码:762 / 774
页数:13
相关论文
共 50 条
  • [1] END-TO-END MULTI-ACCENT SPEECH RECOGNITION WITH UNSUPERVISED ACCENT MODELLING
    Li, Song
    Ouyang, Beibei
    Liao, Dexin
    Xia, Shipeng
    Li, Lin
    Hong, Qingyang
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6418 - 6422
  • [2] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
  • [3] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    [J]. INTERSPEECH 2022, 2022, : 1387 - 1391
  • [4] Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios
    Kumar, Ankur
    Singh, Sachin
    Gowda, Dhananjaya
    Garg, Abhinav
    Singh, Shatrughan
    Kim, Chanwoo
    [J]. INTERSPEECH 2020, 2020, : 4357 - 4361
  • [5] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
    Luo, Lisai
    Li, Guanyu
    Gong, Chunwei
    Ding, Hailan
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [6] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [7] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [8] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
  • [9] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [10] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552