Perceptual MVDR-based Unsupervised Built-in Speaker Normalization for Kazakh Speech Recognition

被引：0

作者：

Yessenbayev, Zhandos ^{[1
]}

Yapanel, Umit ^{[2
]}

机构：

[1] Nazarbayev Univ Res & Innovat Syst, Astana, Kazakhstan

[2] Yapanel Speech Technol, Sunnyvale, CA USA

来源：

2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT) | 2014年

关键词：

Unsupervised speaker normalization; Kazakh speech recognition; phone recognition;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20%. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.

引用

页码：87 / 91

页数：5

共 50 条

[31] COMBINING SPEAKER AND NOISE FEATURE NORMALIZATION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
Garcia, L.
Benitez, C.
Segura, J. C.
Umesh, S.
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5496 - 5499
[32] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
Xue, Shaofei
Jiang, Hui
Dai, Lirong
Liu, Qingfeng
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
[33] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
Wang, Jun
Hahm, Seongjun
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
[34] Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling
Lawson, A. D.
Huggins, M. C.
Grieco, J. J.
Galligan, S. A.
Harris, D. M.
[J]. 2006 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2006, : 3280 - 3285
[35] Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition
Kim, Jae-Bok
Park, Jeong-Sik
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 126 - 134
[36] A speaker based unsupervised speech segmentation algorithm used in conversational speech
Chen, Yanxiang
Wang, Qiong
[J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2007, 4798 : 396 - +
[37] ROBUST SPEECH RECOGNITION BASED ON STRUCTURED MODELING, IRRELEVANT VARIABILITY NORMALIZATION AND UNSUPERVISED ONLINE ADAPTATION
Huo, Qiang
Zhu, Donglai
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4637 - +
[38] Speaker Recognition and Speech Emotion Recognition Based on GMM
Xu, Shupeng
Liu, Yan
Liu, Xiping
[J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
[39] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
Nejadgholi, Isar
Seyyedsalehi, Seyyed Ali
[J]. NEURAL COMPUTING & APPLICATIONS, 2009, 18 (01): : 45 - 55
[40] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
Isar Nejadgholi
Seyyed Ali Seyyedsalehi
[J]. Neural Computing and Applications, 2009, 18 : 45 - 55

← 1 2 3 4 5 →