SINGULAR VALUE DECOMPOSITION BASED LOW-FOOTPRINT SPEAKER ADAPTATION AND PERSONALIZATION FOR DEEP NEURAL NETWORK

被引:0
|
作者
Xue, Jian [1 ]
Li, Jinyu [1 ]
Yu, Dong [1 ]
Seltzer, Mike [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
deep neural network; speaker adaptation; speaker personalization; singular value decomposition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The large number of parameters in deep neural networks (DNN) for automatic speech recognition (ASR) makes speaker adaptation very challenging. It also limits the use of speaker personalization due to the huge storage cost in large-scale deployments. In this paper we address DNN adaptation and personalization issues by presenting two methods based on the singular value decomposition (SVD). The first method uses an SVD to replace the weight matrix of a speaker independent DNN by the product of two low rank matrices. Adaptation is then performed by updating a square matrix inserted between the two low-rank matrices. In the second method, we adapt the full weight matrix but only store the delta matrix - the difference between the original and adapted weight matrices. We decrease the footprint of the adapted model by storing a reduced rank version of the delta matrix via an SVD. The proposed methods were evaluated on short message dictation task. Experimental results show that we can obtain similar accuracy improvements as the previously proposed Kullback-Leibler divergence (KLD) regularized method with far fewer parameters, which only requires 0.89% of the original model storage.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [2] NEURAL NETWORK FOR SINGULAR VALUE DECOMPOSITION
    CICHOCKI, A
    [J]. ELECTRONICS LETTERS, 1992, 28 (08) : 784 - 786
  • [3] Restructuring of Deep Neural Network Acoustic Models with Singular Value Decomposition
    Xue, Jian
    Li, Jinyu
    Gong, Yifan
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2364 - 2368
  • [4] Neural network for text classification based on singular value decomposition
    Li, Cheng Hua
    Park, Soon Cheol
    [J]. 2007 CIT: 7TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 47 - 52
  • [5] Interference Recognition Based on Singular Value Decomposition and Neural Network
    Feng Man
    Wang Zinan
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (11) : 2573 - 2578
  • [6] Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [7] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 1 - +
  • [8] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Shaofei Xue
    Hui Jiang
    Lirong Dai
    Qingfeng Liu
    [J]. Journal of Signal Processing Systems, 2016, 82 : 175 - 185
  • [9] Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 175 - 185
  • [10] INVESTIGATING ONLINE LOW-FOOTPRINT SPEAKER ADAPTATION USING GENERALIZED LINEAR REGRESSION AND CLICK-THROUGH DATA
    Zhao, Yong
    Li, Jinyu
    Xue, Jian
    Gong, Yifan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4310 - 4314