Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

被引:0
|
作者
Xue, Shaofei [1 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China
[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON, Canada
关键词
Deep Neural Network (DNN); Hybrid DNN/HMM; Speaker Adaptation; singular value decomposition (SVD); TRANSFORMATIONS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the weight matrices in trained DNNs to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs, and then tune diagonal matrices with the adaptation data. This solves the over-fitting problem since we can change the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed SVD-based adaptation method may achieve up to 3-6% relative error reduction using only a few dozens of adaptation utterances per speaker.
引用
收藏
页码:1 / +
页数:3
相关论文
共 50 条
  • [1] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Shaofei Xue
    Hui Jiang
    Lirong Dai
    Qingfeng Liu
    [J]. Journal of Signal Processing Systems, 2016, 82 : 175 - 185
  • [2] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 175 - 185
  • [3] FAST SPEAKER ADAPTATION OF HYBRID NN/HMM MODEL FOR SPEECH RECOGNITION BASED ON DISCRIMINATIVE LEARNING OF SPEAKER CODE
    Abdel-Hamid, Ossama
    Jiang, Hui
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7942 - 7946
  • [4] Model adaptation based on HMM decomposition for reverberant speech recognition
    Takiguchi, T
    Nakamura, S
    Huo, Q
    Shikano, K
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 827 - 830
  • [5] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [6] Speech/speaker recognition using a HMM/GMM hybrid model
    Rodriguez, E
    Ruiz, B
    Garcia-Crespo, A
    Garcia, F
    [J]. AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, 1997, 1206 : 227 - 234
  • [7] On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model
    Soutner, Daniel
    Zelinka, Jan
    Mueller, Ludek
    [J]. SPEECH AND COMPUTER, 2014, 8773 : 315 - 321
  • [8] Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition
    Zhan, Hongjian
    Chen, Guilin
    Lu, Yue
    [J]. PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 427 - 435
  • [9] The hybrid model of speech recognition based on HMM and HMMNN
    Wang Sheguo
    Tong Jianing
    Yuan Yujin
    [J]. 2009 INTERNATIONAL CONFERENCE ON E-BUSINESS AND INFORMATION SYSTEM SECURITY, VOLS 1 AND 2, 2009, : 926 - +
  • [10] DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE
    Xue, Shaofei
    Abdel-Hamid, Ossama
    Jiang, Hui
    Dai, Lirong
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,