MULTI-ACCENT SPEECH RECOGNITION WITH HIERARCHICAL GRAPHEME BASED MODELS

被引:0
|
作者
Rao, Kanishka [1 ]
Sak, Hasim [1 ]
机构
[1] Google Inc, Speech Grp, Mountain View, CA 94043 USA
关键词
deep neural networks; grapheme; acoustic modelling; CTC;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC) loss. The models learn to align utterances with phonetic transcriptions in a lower layer and graphemic transcriptions in the final layer in a multi-task learning setting. Using the grapheme predictions from a hierarchical model trained on 3 million US English utterances results in 6.7 % relative word error rate (WER) increase when compared to using the phoneme-based acoustic model trained on the same data. However, we show that hierarchical grapheme-based models trained on larger acoustic data (12 million utterances) jointly for grapheme and phoneme prediction task outperform phoneme only model by 6.9 % relative WER. We train a single multi-dialect model using a combined US, British, Indian and Australian English data set and then adapt the model using US English data only. This adapted multi-accent model outperforms a model exclusively trained on US English. This process is repeated for phoneme-based and grapheme-based acoustic models for all four dialects and larger improvements are obtained with grapheme models. Additionally using a multi-accent grapheme model, we observe large recognition accuracy improvements for Indian-accented utterances in Google VoiceSearch US traffic with a 4 0 % relative WER reduction.
引用
收藏
页码:4815 / 4819
页数:5
相关论文
共 50 条
  • [1] Multi-Accent Chinese Speech Recognition
    Liu Yi
    Fung, Pascale
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 133 - +
  • [2] END-TO-END MULTI-ACCENT SPEECH RECOGNITION WITH UNSUPERVISED ACCENT MODELLING
    Li, Song
    Ouyang, Beibei
    Liao, Dexin
    Xia, Shipeng
    Li, Lin
    Hong, Qingyang
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6418 - 6422
  • [3] JOINT MODELING OF ACCENTS AND ACOUSTICS FOR MULTI-ACCENT SPEECH RECOGNITION
    Yang, Xuesong
    Audhkhasi, Kartik
    Rosenberg, Andrew
    Thomas, Samuel
    Ramabhadran, Bhuvana
    Hasegawa-Johnson, Mark
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5989 - 5993
  • [4] Multi-Accent and Accent-Independent Non-Native Speech Recognition
    Bouselmi, Ghazi
    Fohr, Dominique
    Illina, Irina
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2703 - +
  • [5] Investigations of Low Resource Multi-Accent Mandarin Speech Recognition
    Wang, Wei
    Xu, Wenying
    Sui, Xiang
    Wang, Lan
    Liu, Xunying
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 62 - 66
  • [6] A Multi-Accent Acoustic Model using Mixture of Experts for Speech Recognition
    Jain, Abhinav
    Singh, Vishwanath P.
    Rath, Shakti P.
    [J]. INTERSPEECH 2019, 2019, : 779 - 783
  • [7] Adaptive Attention Network with Domain Adversarial Training for Multi-Accent Speech Recognition
    Yang, Yanbing
    Shi, Hao
    Lin, Yuqin
    Ge, Meng
    Wang, Longbiao
    Hou, Qingzhi
    Dang, Jianwu
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 6 - 10
  • [8] Layer-Wise Fast Adaptation for End to End Multi-Accent Speech Recognition
    Qian, Yanmin
    Gong, Xun
    Huang, Houjun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2842 - 2853
  • [9] CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition
    Yi, Jiangyan
    Wen, Zhengqi
    Tao, Jianhua
    Ni, Hao
    Liu, Bin
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 985 - 997
  • [10] RELIABLE ACCENT SPECIFIC UNIT GENERATION WITH DYNAMIC GAUSSIAN MIXTURE SELECTION FOR MULTI-ACCENT SPEECH RECOGNITION
    Zhang, Chao
    Liu, Yi
    Xia, Yunqing
    Zheng, Thomas Fang
    Olsen, Jesper
    Tian, JiLei
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,