Perceptual MVDR-based Unsupervised Built-in Speaker Normalization for Kazakh Speech Recognition

被引：0

作者：

Yessenbayev, Zhandos ^{[1
]}

Yapanel, Umit ^{[2
]}

机构：

[1] Nazarbayev Univ Res & Innovat Syst, Astana, Kazakhstan

[2] Yapanel Speech Technol, Sunnyvale, CA USA

来源：

2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT) | 2014年

关键词：

Unsupervised speaker normalization; Kazakh speech recognition; phone recognition;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20%. The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.

引用

页码：87 / 91

页数：5

共 50 条

[41] ON COMBINING DNN AND GMM WITH UNSUPERVISED SPEAKER ADAPTATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
Liu, Shilin
Sim, Khe Chai
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[42] Supervised and unsupervised speaker adaptation in large vocabulary continuous speech recognition of Czech
Cerva, P
Nouza, J
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 203 - 210
[43] Feature extraction and normalization in SVM-based speaker recognition
Mazibuko, Thembisile
Mashao, Daniel
[J]. WMSCI 2006: 10TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS, 2006, : 260 - +
[44] Blind score normalization method for PLDA based speaker recognition
Doroshin, Danila
Lubimov, Nikolay
Nastasenko, Marina
Kotov, Mikhail
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 210 - 213
[45] Robust endpoint detection and energy normalization for real-time speech and speaker recognition
Li, Q
Zheng, JS
Tsai, A
Zhou, QR
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03): : 146 - 157
[46] Ensemble based speaker recognition using unsupervised data selection
Huang, Chien-Lin
Wang, Jia-Ching
Ma, Bin
[J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
[47] On the Jointly Unsupervised Feature Vector Normalization and Acoustic Model Compensation for Robust Speech Recognition
Buera, Luis
Miguel, Antonio
Lleida, Eduardo
Saz, Oscar
Ortega, Alfonso
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1381 - 1384
[48] Research on automatic speaker recognition based on speech clustering
Xu, Limin
Qian, Bo
Cheng, Weiming
Tang, Zhenmin
[J]. ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 105 - +
[49] Emotion recognition based on customized smart bracelet with built-in accelerometer
Zhang, Zhan
Song, Yufei
Cui, Liqing
Liu, Xiaoqian
Zhu, Tingshao
[J]. PEERJ, 2016, 4
[50] SPEAKER ADAPTATION OF RNN-BLSTM FOR SPEECH RECOGNITION BASED ON SPEAKER CODE
Huang, Zhiying
Tang, Jian
Xue, Shaofei
Dai, Lirong
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5305 - 5309

← 1 2 3 4 5 →