MULTI-ACCENT SPEECH RECOGNITION WITH HIERARCHICAL GRAPHEME BASED MODELS

被引：0

作者：

Rao, Kanishka ^{[1
]}

Sak, Hasim ^{[1
]}

机构：

[1] Google Inc, Speech Grp, Mountain View, CA 94043 USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

deep neural networks; grapheme; acoustic modelling; CTC;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC) loss. The models learn to align utterances with phonetic transcriptions in a lower layer and graphemic transcriptions in the final layer in a multi-task learning setting. Using the grapheme predictions from a hierarchical model trained on 3 million US English utterances results in 6.7 % relative word error rate (WER) increase when compared to using the phoneme-based acoustic model trained on the same data. However, we show that hierarchical grapheme-based models trained on larger acoustic data (12 million utterances) jointly for grapheme and phoneme prediction task outperform phoneme only model by 6.9 % relative WER. We train a single multi-dialect model using a combined US, British, Indian and Australian English data set and then adapt the model using US English data only. This adapted multi-accent model outperforms a model exclusively trained on US English. This process is repeated for phoneme-based and grapheme-based acoustic models for all four dialects and larger improvements are obtained with grapheme models. Additionally using a multi-accent grapheme model, we observe large recognition accuracy improvements for Indian-accented utterances in Google VoiceSearch US traffic with a 4 0 % relative WER reduction.

引用

页码：4815 / 4819

页数：5

共 50 条

[11] Multi-accent speech recognition of Afrikaans, Black and White varieties of South African English
Kamper, Herman
Niesler, Thomas
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3196 - 3199
[12] CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition
Jiangyan Yi
Zhengqi Wen
Jianhua Tao
Hao Ni
Bin Liu
[J]. Journal of Signal Processing Systems, 2018, 90 : 985 - 997
[13] A General Framework for Multi-Accent Mandarin Speech Recognition Using Adaptive Neural Networks
Sui, Xiang
Wang, Huiyong
Wang, Lan
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 118 - 122
[14] Multi-accent Speech Separation with One Shot Learning
Huang, Kuan Po
Wu, Yuan-Kuei
Lee, Hung-yi
[J]. 1ST WORKSHOP ON META LEARNING AND ITS APPLICATIONS TO NATURAL LANGUAGE PROCESSING (METANLP 2021), 2021, : 59 - 66
[15] DISCRIMINATIVE DYNAMIC GAUSSIAN MIXTURE SELECTION WITH ENHANCED ROBUSTNESS AND PERFORMANCE FOR MULTI-ACCENT SPEECH RECOGNITION
Zhang, Chao
Liu, Yi
Xia, Yunqing
Lee, Chin-Hui
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4749 - 4752
[16] Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
Gong, Xun
Lu, Yizhou
Zhou, Zhikai
Qian, Yanmin
[J]. INTERSPEECH 2021, 2021, : 1274 - 1278
[17] Multi-Accent Adaptation based on Gate Mechanism
Zhu, Han
Wang, Li
Zhang, Pengyuan
Yan, Yonghong
[J]. INTERSPEECH 2019, 2019, : 744 - 748
[18] Improving Deep Neural Networks Based Multi-Accent Mandarin Speech Recognition Using I-Vectors and Accent-Specific Top layer
Chen, Mingming
Yang, Zhanlei
Liang, Jizhong
Li, Yanpeng
Liu, Wenju
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3620 - 3624
[19] Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition
Zhang, Chao
Liu, Yi
Xia, Yunqing
Wang, Xuan
Lee, Chin-Hui
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2073 - 2084
[20] Korean speech recognition based on grapheme
Lee, Mun-hak
Chang, Joon-Hyuk
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (05): : 601 - 606

← 1 2 3 4 5 →