Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition

被引:5
|
作者
Zhao, Shengkui [1 ]
Ni, Chongjia [1 ]
Tong, Rong [1 ]
Ma, Bin [1 ]
机构
[1] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
来源
关键词
Robust speech recognition; convolutional neural networks; acoustic model; generative adversarial networks; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-2078
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Robustness of automatic speech recognition (ASR) systems is a critical issue due to noise and reverberations. Speech enhancement and model adaptation have been studied for long time to address this issue. Recently, the developments of multi-task joint-learning scheme that addresses noise reduction and ASR criteria in a unified modeling framework show promising improvements, but the model training highly relies on paired clean-noisy data. To overcome this limit, the generative adversarial networks (GANs) and the adversarial training method are deployed, which have greatly simplified the model training process without the requirements of complex front-end design and paired training data. Despite the fast developments of GANs for computer visions, only regular GANs have been adopted for robust ASR. In this work, we adopt a more advanced cycle-consistency GAN (CycleGAN) to address the training failure problem due to mode collapse of regular GANs. Using deep residual networks (ResNets), we further expand the multi-task scheme to a multi-task multi-network joint-learning scheme for more robust noise reduction and model adaptation. Experiment results on CHiME-4 show that our proposed approach significantly improves the noise robustness of the ASR system by achieving much lower word error rates (WERs) than the state-of-the-art joint-learning approaches.
引用
收藏
页码:1238 / 1242
页数:5
相关论文
共 50 条
  • [1] MULTI-TASK JOINT-LEARNING OF DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Yin, Maofan
    You, Yongbin
    Yu, Kai
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 310 - 316
  • [2] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [3] Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance
    Kao, Chao Yuan
    Ko, Hanseok
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (06): : 670 - 677
  • [4] Multi-Task Joint-Learning for Robust Voice Activity Detection
    Zhuang, Yimeng
    Tong, Sibo
    Yin, Maofan
    Qian, Yanmin
    Yu, Kai
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [5] STATISTICAL PARAMETRIC SPEECH SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORKS UNDER A MULTI-TASK LEARNING FRAMEWORK
    Yang, Shan
    Xie, Lei
    Chen, Xiao
    Lou, Xiaoyan
    Zhu, Xuan
    Huang, Dongyan
    Li, Haizhou
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 685 - 691
  • [6] Enhanced Pest Recognition Using Multi-Task Deep Learning with the Discriminative Attention Multi-Network
    Dong, Zhaojie
    Wei, Xinyu
    Wu, Yonglin
    Guo, Jiaming
    Zeng, Zhixiong
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [7] A Deep Multi-task Generative Adversarial Network for Face Completion
    Wang, Qiang
    Fan, Huijie
    Tang, Yandong
    [J]. INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT II, 2022, 13456 : 405 - 416
  • [8] Multi-task Learning Deep Neural Networks For Speech Feature Denoising
    Huang, Bin
    Ke, Dengfeng
    Zheng, Hao
    Xu, Bo
    Xu, Yanyan
    Su, Kaile
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2464 - 2468
  • [9] Multi-task and Generative Adversarial Learning for Robust and Sustainable Text Classification
    Breazzano, Claudia
    Croce, Danilo
    Basili, Roberto
    [J]. AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 228 - 244
  • [10] GENERATIVE ADVERSARIAL MULTI-TASK LEARNING FOR FACE SKETCH SYNTHESIS AND RECOGNITION
    Wan, Weiguo
    Lee, Hyo Jong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 4065 - 4069