END-TO-END MULTI-ACCENT SPEECH RECOGNITION WITH UNSUPERVISED ACCENT MODELLING

被引:6
|
作者
Li, Song [1 ]
Ouyang, Beibei [1 ]
Liao, Dexin [2 ]
Xia, Shipeng [2 ]
Li, Lin [1 ]
Hong, Qingyang [2 ]
机构
[1] Xiamen Univ, Sch Elect Sci & Technol, Xiamen, Peoples R China
[2] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
基金
中国国家自然科学基金;
关键词
End-to-end; speech recognition; multi-accent; global embedding;
D O I
10.1109/ICASSP39728.2021.9414833
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end speech recognition has achieved good recognition performance on standard English pronunciation datasets. However, one prominent problem with end-to-end speech recognition systems is that non-native English speakers tend to have complex and varied accents, which reduces the accuracy of English speech recognition in different countries. In order to grapple with such an issue, we first investigate and improve the current mainstream end-to-end multi-accent speech recognition technologies. In addition, we propose two unsupervised accent modelling methods, which convert accent information into a global embedding, and use it to improve the performance of the end-to-end multi-accent speech recognition systems. Experimental results on accented English datasets of eight countries (AESRC2020) show that, compared with the Transformer baseline, our proposed methods achieve relative 14.8% and 15.4% average word error rate (WER) reduction in the development set and evaluation set, respectively.
引用
收藏
页码:6418 / 6422
页数:5
相关论文
共 50 条
  • [31] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
    Tripathi, Anshuman
    Lu, Han
    Sak, Hasim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
  • [32] End-to-End Multilingual Multi-Speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    [J]. INTERSPEECH 2019, 2019, : 3755 - 3759
  • [33] Streaming End-to-End Multi-Talker Speech Recognition
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
  • [34] CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition
    Jiangyan Yi
    Zhengqi Wen
    Jianhua Tao
    Hao Ni
    Bin Liu
    [J]. Journal of Signal Processing Systems, 2018, 90 : 985 - 997
  • [35] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6134 - 6138
  • [36] Multi-channel Attention for End-to-End Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Anumula, Jithendar
    Ceolini, Enea
    Liu, Shih-Chii
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
  • [37] Improving Deep Neural Networks Based Multi-Accent Mandarin Speech Recognition Using I-Vectors and Accent-Specific Top layer
    Chen, Mingming
    Yang, Zhanlei
    Liang, Jizhong
    Li, Yanpeng
    Liu, Wenju
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3620 - 3624
  • [38] End-to-end ASR framework for Indian-English accent: using speech CNN-based segmentation
    Ahmed G.
    Lawaye A.A.
    [J]. International Journal of Speech Technology, 2023, 26 (04) : 903 - 918
  • [39] END-TO-END MULTIMODAL SPEECH RECOGNITION
    Palaskar, Shruti
    Sanabria, Ramon
    Metze, Florian
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
  • [40] End-to-End Speech Recognition in Russian
    Markovnikov, Nikita
    Kipyatkova, Irina
    Lyakso, Elena
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386