Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation

被引:0
|
作者
Shen, Peng [1 ]
Lu, Xugang [1 ]
Kawai, Hisashi [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Tokyo, Japan
关键词
Elastic net regularization; Lasso constrained regularization; Speaker adaptation; Deep neural networks;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Adaptation of deep neural network (DNN) acoustic model has been proven to significantly improve the automatic speech recognition (ASR) performance. But how to improve the generalization ability of the adapted model is still a challenge problem. In this study, we investigated algorithms to improve model generalization ability in a parameter regularization framework. Although some regularization algorithms have been proposed, there is no investigation on how the effects of using different regularization constraints in adaptation (e.g., parameter space smoothness or sparsity etc.). We investigated regularization constraints in a lp regularization framework which includes l(1), l(2) regularization, and several constrained forms of them. We carried out the investigation on a lecture speech recognition task. Our investigation showed that most of the regularization constraints could improve the performance but with different parameter updating mechanisms. The regularization constraint which makes the adaptation to pick up only a few model parameters for updating showed the most effective. In addition, by combining different regularization constraints, further improvements could be achieved.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] MULTI-LEVEL DEEP NEURAL NETWORK ADAPTATION FOR SPEAKER VERIFICATION USING MMD AND CONSISTENCY REGULARIZATION
    Lin, Weiwei
    Mak, Man-Mai
    Li, Na
    Su, Dan
    Yu, Dong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6839 - 6843
  • [2] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [3] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [5] Regularization based eigenvoice speaker adaptation method
    Zhang, Wen-Lin
    Zhang, Lian-Hai
    Niu, Tong
    Qu, Dan
    Li, Bi-Cheng
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2012, 38 (12): : 1950 - 1957
  • [6] Deep Domain Adaptation Based on Adversarial Network With Graph Regularization
    Jia, Xu
    Ma, Na
    Sun, Fuming
    [J]. IEEE ACCESS, 2020, 8 : 198244 - 198253
  • [7] ADAPTATION OF AN EXPRESSIVE SINGLE SPEAKER DEEP NEURAL NETWORK SPEECH SYNTHESIS SYSTEM
    Parker, Jonathan
    Stylianou, Yannis
    Cipolla, Roberto
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5309 - 5313
  • [8] SINGULAR VALUE DECOMPOSITION BASED LOW-FOOTPRINT SPEAKER ADAPTATION AND PERSONALIZATION FOR DEEP NEURAL NETWORK
    Xue, Jian
    Li, Jinyu
    Yu, Dong
    Seltzer, Mike
    Gong, Yifan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Offline to online speaker adaptation for real-time deep neural network based LVCSR systems
    Yanhua Long
    Yijie Li
    Bo Zhang
    [J]. Multimedia Tools and Applications, 2018, 77 : 28101 - 28119
  • [10] Offline to online speaker adaptation for real-time deep neural network based LVCSR systems
    Long, Yanhua
    Li, Yijie
    Zhang, Bo
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (21) : 28101 - 28119