END-TO-END MULTI-ACCENT SPEECH RECOGNITION WITH UNSUPERVISED ACCENT MODELLING

被引:6
|
作者
Li, Song [1 ]
Ouyang, Beibei [1 ]
Liao, Dexin [2 ]
Xia, Shipeng [2 ]
Li, Lin [1 ]
Hong, Qingyang [2 ]
机构
[1] Xiamen Univ, Sch Elect Sci & Technol, Xiamen, Peoples R China
[2] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
基金
中国国家自然科学基金;
关键词
End-to-end; speech recognition; multi-accent; global embedding;
D O I
10.1109/ICASSP39728.2021.9414833
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end speech recognition has achieved good recognition performance on standard English pronunciation datasets. However, one prominent problem with end-to-end speech recognition systems is that non-native English speakers tend to have complex and varied accents, which reduces the accuracy of English speech recognition in different countries. In order to grapple with such an issue, we first investigate and improve the current mainstream end-to-end multi-accent speech recognition technologies. In addition, we propose two unsupervised accent modelling methods, which convert accent information into a global embedding, and use it to improve the performance of the end-to-end multi-accent speech recognition systems. Experimental results on accented English datasets of eight countries (AESRC2020) show that, compared with the Transformer baseline, our proposed methods achieve relative 14.8% and 15.4% average word error rate (WER) reduction in the development set and evaluation set, respectively.
引用
收藏
页码:6418 / 6422
页数:5
相关论文
共 50 条
  • [41] Overview of end-to-end speech recognition
    Wang, Song
    Li, Guanyu
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [42] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [43] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 2140 - 2144
  • [44] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [45] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [46] Multi-Accent Adaptation based on Gate Mechanism
    Zhu, Han
    Wang, Li
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2019, 2019, : 744 - 748
  • [47] End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
    Kim, Suyoun
    Lane, Ian
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3867 - 3871
  • [48] A Purely End-to-end System for Multi-speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
  • [49] END-TO-END MULTI-MODAL SPEECH RECOGNITION WITH AIR AND BONE CONDUCTED SPEECH
    Chen, Junqi
    Wang, Mou
    Zhang, Xiao-Lei
    Huang, Zhiyong
    Rahardja, Susanto
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6052 - 6056
  • [50] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569