Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

被引：0

作者：

Xue, Shaofei ^{[1
]}

Jiang, Hui ^{[2
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China

[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON, Canada

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

Deep Neural Network (DNN); Hybrid DNN/HMM; Speaker Adaptation; singular value decomposition (SVD); TRANSFORMATIONS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the weight matrices in trained DNNs to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs, and then tune diagonal matrices with the adaptation data. This solves the over-fitting problem since we can change the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed SVD-based adaptation method may achieve up to 3-6% relative error reduction using only a few dozens of adaptation utterances per speaker.

引用

页码：1 / +

页数：3

共 50 条

[1] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
Shaofei Xue
Hui Jiang
Lirong Dai
Qingfeng Liu
[J]. Journal of Signal Processing Systems, 2016, 82 : 175 - 185
[2] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
Xue, Shaofei
Jiang, Hui
Dai, Lirong
Liu, Qingfeng
[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 175 - 185
[3] FAST SPEAKER ADAPTATION OF HYBRID NN/HMM MODEL FOR SPEECH RECOGNITION BASED ON DISCRIMINATIVE LEARNING OF SPEAKER CODE
Abdel-Hamid, Ossama
Jiang, Hui
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7942 - 7946
[4] Model adaptation based on HMM decomposition for reverberant speech recognition
Takiguchi, T
Nakamura, S
Huo, Q
Shikano, K
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 827 - 830
[5] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
Xue, Shaofei
Jiang, Hui
Dai, Lirong
Liu, Qingfeng
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
[6] Speech/speaker recognition using a HMM/GMM hybrid model
Rodriguez, E
Ruiz, B
Garcia-Crespo, A
Garcia, F
[J]. AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, 1997, 1206 : 227 - 234
[7] On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model
Soutner, Daniel
Zelinka, Jan
Mueller, Ludek
[J]. SPEECH AND COMPUTER, 2014, 8773 : 315 - 321
[8] Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition
Zhan, Hongjian
Chen, Guilin
Lu, Yue
[J]. PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 427 - 435
[9] The hybrid model of speech recognition based on HMM and HMMNN
Wang Sheguo
Tong Jianing
Yuan Yujin
[J]. 2009 INTERNATIONAL CONFERENCE ON E-BUSINESS AND INFORMATION SYSTEM SECURITY, VOLS 1 AND 2, 2009, : 926 - +
[10] DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE
Xue, Shaofei
Abdel-Hamid, Ossama
Jiang, Hui
Dai, Lirong
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →