MULTI-TASK JOINT-LEARNING OF DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION

被引:0
|
作者
Qian, Yanmin [1 ,2 ]
Yin, Maofan [1 ]
You, Yongbin [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China
[2] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
Robust speech recognition; Deep neural network; Feature denoising; Multi-task; Noise aware training;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural networks (DNNs) have achieved great success in automatic speech recognition (ASR), significant performance degradation still exists in noisy environments. In this paper, a novel multi-task joint-learning framework is proposed to address the noise robustness for speech recognition. The architecture integrates two different DNNs, including the regressive denoising DNN and the discriminative recognition DNN, into a complete multi-task structure and all the parameters can be optimized in a real joint-learning mode just from the beginning in model training. In addition, the basic multi-task structure is further explored and reorganized into a more general framework which can get substantial gains. Furthermore, noise adaptive training can also be easily incorporated within this architecture to achieve further performance improvement. Experiments on the Aurora4 task showed that the proposed approach can achieve a WER below 10% without using adaptation or sequence training, a very large and significant (more than 20% relative) improvement over a strong DNN-HMM baseline.
引用
收藏
页码:310 / 316
页数:7
相关论文
共 50 条
  • [1] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [2] Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition
    Zhao, Shengkui
    Ni, Chongjia
    Tong, Rong
    Ma, Bin
    [J]. INTERSPEECH 2019, 2019, : 1238 - 1242
  • [3] Multi-Task Joint-Learning for Robust Voice Activity Detection
    Zhuang, Yimeng
    Tong, Sibo
    Yin, Maofan
    Qian, Yanmin
    Yu, Kai
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [4] Multi-task Learning Deep Neural Networks For Speech Feature Denoising
    Huang, Bin
    Ke, Dengfeng
    Zheng, Hao
    Xu, Bo
    Xu, Yanyan
    Su, Kaile
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2464 - 2468
  • [5] MULTI-TASK LEARNING IN DEEP NEURAL NETWORKS FOR IMPROVED PHONEME RECOGNITION
    Seltzer, Michael L.
    Droppo, Jasha
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6965 - 6969
  • [6] JOINT ACOUSTIC MODELING OF TRIPHONES AND TRIGRAPHEMES BY MULTI-TASK LEARNING DEEP NEURAL NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Chen, Dongpeng
    Mak, Brian
    Leung, Cheung-Chi
    Sivadas, Sunil
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Multi-Task Learning in Deep Neural Networks for Mandarin-English Code-Mixing Speech Recognition
    Chen, Mengzhe
    Pan, Jielin
    Zhao, Qingwei
    Yan, Yonghong
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2554 - 2557
  • [8] Attribute Knowledge Integration for Speech Recognition Based on Multi-task Learning Neural Networks
    Zheng, Hao
    Yang, Zhanlei
    Qiao, Liwei
    Li, Jianping
    Liu, Wenju
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 543 - 547
  • [9] MULTI-LINGUAL SPEECH RECOGNITION WITH LOW-RANK MULTI-TASK DEEP NEURAL NETWORKS
    Mohan, Aanchan
    Rose, Richard
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4994 - 4998
  • [10] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993