Perceptual MVDR-based Unsupervised Built-in Speaker Normalization for Kazakh Speech Recognition

被引:0
|
作者
Yessenbayev, Zhandos [1 ]
Yapanel, Umit [2 ]
机构
[1] Nazarbayev Univ Res & Innovat Syst, Astana, Kazakhstan
[2] Yapanel Speech Technol, Sunnyvale, CA USA
关键词
Unsupervised speaker normalization; Kazakh speech recognition; phone recognition;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20%. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.
引用
收藏
页码:87 / 91
页数:5
相关论文
共 50 条
  • [41] ON COMBINING DNN AND GMM WITH UNSUPERVISED SPEAKER ADAPTATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Liu, Shilin
    Sim, Khe Chai
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [42] Supervised and unsupervised speaker adaptation in large vocabulary continuous speech recognition of Czech
    Cerva, P
    Nouza, J
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 203 - 210
  • [43] Feature extraction and normalization in SVM-based speaker recognition
    Mazibuko, Thembisile
    Mashao, Daniel
    [J]. WMSCI 2006: 10TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS, 2006, : 260 - +
  • [44] Blind score normalization method for PLDA based speaker recognition
    Doroshin, Danila
    Lubimov, Nikolay
    Nastasenko, Marina
    Kotov, Mikhail
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 210 - 213
  • [45] Robust endpoint detection and energy normalization for real-time speech and speaker recognition
    Li, Q
    Zheng, JS
    Tsai, A
    Zhou, QR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03): : 146 - 157
  • [46] Ensemble based speaker recognition using unsupervised data selection
    Huang, Chien-Lin
    Wang, Jia-Ching
    Ma, Bin
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
  • [47] On the Jointly Unsupervised Feature Vector Normalization and Acoustic Model Compensation for Robust Speech Recognition
    Buera, Luis
    Miguel, Antonio
    Lleida, Eduardo
    Saz, Oscar
    Ortega, Alfonso
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1381 - 1384
  • [48] Research on automatic speaker recognition based on speech clustering
    Xu, Limin
    Qian, Bo
    Cheng, Weiming
    Tang, Zhenmin
    [J]. ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 105 - +
  • [49] Emotion recognition based on customized smart bracelet with built-in accelerometer
    Zhang, Zhan
    Song, Yufei
    Cui, Liqing
    Liu, Xiaoqian
    Zhu, Tingshao
    [J]. PEERJ, 2016, 4
  • [50] SPEAKER ADAPTATION OF RNN-BLSTM FOR SPEECH RECOGNITION BASED ON SPEAKER CODE
    Huang, Zhiying
    Tang, Jian
    Xue, Shaofei
    Dai, Lirong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5305 - 5309