MULTI-ACCENT SPEECH RECOGNITION WITH HIERARCHICAL GRAPHEME BASED MODELS

被引:0
|
作者
Rao, Kanishka [1 ]
Sak, Hasim [1 ]
机构
[1] Google Inc, Speech Grp, Mountain View, CA 94043 USA
关键词
deep neural networks; grapheme; acoustic modelling; CTC;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC) loss. The models learn to align utterances with phonetic transcriptions in a lower layer and graphemic transcriptions in the final layer in a multi-task learning setting. Using the grapheme predictions from a hierarchical model trained on 3 million US English utterances results in 6.7 % relative word error rate (WER) increase when compared to using the phoneme-based acoustic model trained on the same data. However, we show that hierarchical grapheme-based models trained on larger acoustic data (12 million utterances) jointly for grapheme and phoneme prediction task outperform phoneme only model by 6.9 % relative WER. We train a single multi-dialect model using a combined US, British, Indian and Australian English data set and then adapt the model using US English data only. This adapted multi-accent model outperforms a model exclusively trained on US English. This process is repeated for phoneme-based and grapheme-based acoustic models for all four dialects and larger improvements are obtained with grapheme models. Additionally using a multi-accent grapheme model, we observe large recognition accuracy improvements for Indian-accented utterances in Google VoiceSearch US traffic with a 4 0 % relative WER reduction.
引用
收藏
页码:4815 / 4819
页数:5
相关论文
共 50 条
  • [11] Multi-accent speech recognition of Afrikaans, Black and White varieties of South African English
    Kamper, Herman
    Niesler, Thomas
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3196 - 3199
  • [12] CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition
    Jiangyan Yi
    Zhengqi Wen
    Jianhua Tao
    Hao Ni
    Bin Liu
    [J]. Journal of Signal Processing Systems, 2018, 90 : 985 - 997
  • [13] A General Framework for Multi-Accent Mandarin Speech Recognition Using Adaptive Neural Networks
    Sui, Xiang
    Wang, Huiyong
    Wang, Lan
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 118 - 122
  • [14] Multi-accent Speech Separation with One Shot Learning
    Huang, Kuan Po
    Wu, Yuan-Kuei
    Lee, Hung-yi
    [J]. 1ST WORKSHOP ON META LEARNING AND ITS APPLICATIONS TO NATURAL LANGUAGE PROCESSING (METANLP 2021), 2021, : 59 - 66
  • [15] DISCRIMINATIVE DYNAMIC GAUSSIAN MIXTURE SELECTION WITH ENHANCED ROBUSTNESS AND PERFORMANCE FOR MULTI-ACCENT SPEECH RECOGNITION
    Zhang, Chao
    Liu, Yi
    Xia, Yunqing
    Lee, Chin-Hui
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4749 - 4752
  • [16] Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
    Gong, Xun
    Lu, Yizhou
    Zhou, Zhikai
    Qian, Yanmin
    [J]. INTERSPEECH 2021, 2021, : 1274 - 1278
  • [17] Multi-Accent Adaptation based on Gate Mechanism
    Zhu, Han
    Wang, Li
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2019, 2019, : 744 - 748
  • [18] Improving Deep Neural Networks Based Multi-Accent Mandarin Speech Recognition Using I-Vectors and Accent-Specific Top layer
    Chen, Mingming
    Yang, Zhanlei
    Liang, Jizhong
    Li, Yanpeng
    Liu, Wenju
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3620 - 3624
  • [19] Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition
    Zhang, Chao
    Liu, Yi
    Xia, Yunqing
    Wang, Xuan
    Lee, Chin-Hui
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2073 - 2084
  • [20] Korean speech recognition based on grapheme
    Lee, Mun-hak
    Chang, Joon-Hyuk
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (05): : 601 - 606