Perceptual MVDR-based Unsupervised Built-in Speaker Normalization for Kazakh Speech Recognition

被引:0
|
作者
Yessenbayev, Zhandos [1 ]
Yapanel, Umit [2 ]
机构
[1] Nazarbayev Univ Res & Innovat Syst, Astana, Kazakhstan
[2] Yapanel Speech Technol, Sunnyvale, CA USA
关键词
Unsupervised speaker normalization; Kazakh speech recognition; phone recognition;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20%. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.
引用
收藏
页码:87 / 91
页数:5
相关论文
共 50 条
  • [31] COMBINING SPEAKER AND NOISE FEATURE NORMALIZATION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
    Garcia, L.
    Benitez, C.
    Segura, J. C.
    Umesh, S.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5496 - 5499
  • [32] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [33] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
    Wang, Jun
    Hahm, Seongjun
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
  • [34] Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling
    Lawson, A. D.
    Huggins, M. C.
    Grieco, J. J.
    Galligan, S. A.
    Harris, D. M.
    [J]. 2006 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2006, : 3280 - 3285
  • [35] Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition
    Kim, Jae-Bok
    Park, Jeong-Sik
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 126 - 134
  • [36] A speaker based unsupervised speech segmentation algorithm used in conversational speech
    Chen, Yanxiang
    Wang, Qiong
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2007, 4798 : 396 - +
  • [37] ROBUST SPEECH RECOGNITION BASED ON STRUCTURED MODELING, IRRELEVANT VARIABILITY NORMALIZATION AND UNSUPERVISED ONLINE ADAPTATION
    Huo, Qiang
    Zhu, Donglai
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4637 - +
  • [38] Speaker Recognition and Speech Emotion Recognition Based on GMM
    Xu, Shupeng
    Liu, Yan
    Liu, Xiping
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
  • [39] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Nejadgholi, Isar
    Seyyedsalehi, Seyyed Ali
    [J]. NEURAL COMPUTING & APPLICATIONS, 2009, 18 (01): : 45 - 55
  • [40] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Isar Nejadgholi
    Seyyed Ali Seyyedsalehi
    [J]. Neural Computing and Applications, 2009, 18 : 45 - 55