Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition

被引:5
|
作者
Zhao, Shengkui [1 ]
Ni, Chongjia [1 ]
Tong, Rong [1 ]
Ma, Bin [1 ]
机构
[1] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
来源
关键词
Robust speech recognition; convolutional neural networks; acoustic model; generative adversarial networks; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-2078
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Robustness of automatic speech recognition (ASR) systems is a critical issue due to noise and reverberations. Speech enhancement and model adaptation have been studied for long time to address this issue. Recently, the developments of multi-task joint-learning scheme that addresses noise reduction and ASR criteria in a unified modeling framework show promising improvements, but the model training highly relies on paired clean-noisy data. To overcome this limit, the generative adversarial networks (GANs) and the adversarial training method are deployed, which have greatly simplified the model training process without the requirements of complex front-end design and paired training data. Despite the fast developments of GANs for computer visions, only regular GANs have been adopted for robust ASR. In this work, we adopt a more advanced cycle-consistency GAN (CycleGAN) to address the training failure problem due to mode collapse of regular GANs. Using deep residual networks (ResNets), we further expand the multi-task scheme to a multi-task multi-network joint-learning scheme for more robust noise reduction and model adaptation. Experiment results on CHiME-4 show that our proposed approach significantly improves the noise robustness of the ASR system by achieving much lower word error rates (WERs) than the state-of-the-art joint-learning approaches.
引用
收藏
页码:1238 / 1242
页数:5
相关论文
共 50 条
  • [31] Fault Diagnosis of the Rolling Bearing by a Multi-Task Deep Learning Method Based on a Classifier Generative Adversarial Network
    Shen, Zhunan
    Kong, Xiangwei
    Cheng, Liu
    Wang, Rengen
    Zhu, Yunpeng
    [J]. SENSORS, 2024, 24 (04)
  • [32] DEEP NEURAL NETWORKS EMPLOYING MULTI-TASK LEARNING AND STACKED BOTTLENECK FEATURES FOR SPEECH SYNTHESIS
    Wu, Zhizheng
    Valentini-Botinhao, Cassia
    Watts, Oliver
    King, Simon
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4460 - 4464
  • [33] Multi-task learning of deep neural networks for joint automatic speaker verification and spoofing detection
    Li, Jiakang
    Sun, Meng
    Zhang, Xiongwei
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1517 - 1522
  • [34] MULTI-TASK GENERATIVE ADVERSARIAL NETWORKS FOR SEMANTIC-GUIDED REMOTE SENSING IMAGE GENERATION
    Gong, Yushu
    Li, Yuxia
    He, Lei
    Xia, YongQiang
    Yang, Yizhuo
    Tong, Zhonggui
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5150 - 5153
  • [35] Cell tracking using deep neural networks with multi-task learning
    He, Tao
    Mao, Hua
    Guo, Jixiang
    Yi, Zhang
    [J]. IMAGE AND VISION COMPUTING, 2017, 60 : 142 - 153
  • [36] Rapid Adaptation for Deep Neural Networks through Multi-Task Learning
    Huang, Zhen
    Li, Jinyu
    Siniscalchi, Sabato Marco
    Chen, I-Fan
    Wu, Ji
    Lee, Chin-Hui
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3625 - 3629
  • [37] A Deep Neural Networks Based on Multi-task Learning and Its Application
    Zhao, Mengru
    Zhang, Yuxian
    Qiao, Likui
    Sun, Deyuan
    [J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 6201 - 6206
  • [38] MULTI-TASK LEARNING FOR SEGMENTATION OF BUILDING FOOTPRINTS WITH DEEP NEURAL NETWORKS
    Bischke, Benjamin
    Helber, Patrick
    Folz, Joachim
    Borth, Damian
    Dengel, Andreas
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1480 - 1484
  • [39] Joint Learning of Image Deblurring and Depth Estimation Through Adversarial Multi-Task Network
    Hou, Shengyu
    Fu, Mengyin
    Song, Wenjie
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7327 - 7341
  • [40] Joint Client Selection and Task Assignment for Multi-Task Federated Learning in MEC Networks
    Cheng, Zhipeng
    Min, Minghui
    Liwang, Minghui
    Gao, Zhibin
    Huang, Lianfen
    [J]. 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,