Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-target Learning for Noisy Speech Recognition

被引:15
|
作者
Mimura, Masato [1 ]
Sakai, Shinsuke [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Sch Informat, Sakyo Ku, Kyoto 6068501, Japan
关键词
Speech Recognition; Speech Enhancement; Deep Neural Network (DNN); Denoising Autoencoder (DAE); DEEP NEURAL-NETWORKS; ADAPTATION;
D O I
10.21437/Interspeech.2016-388
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Denoising autoencoders (DAEs) have been investigated for enhancing noisy speech before feeding it to the back-end deep neural network (DNN) acoustic model, but there may be a mismatch between the DAE output and the expected input of the back-end DNN, and also inconsistency between the training objective functions of the two networks. In this paper, a joint optimization method of the front-end DAE and the back-end DNN is proposed based on a multi-target learning scheme. In the first step, the front-end DAE is trained with an additional target of minimizing the errors propagated by the back-end DNN. Then, the unified network of DAE and DNN is fine-tuned for the phone state classification target, with an extra target of input speech enhancement imposed to the DAE part. The proposed method has been evaluated with the CHiME3 ASR task, and demonstrated to improve the baseline DNN as well as the simple coupling of DAE with DNN. The method is also effective as a post-filter of a beamformer.
引用
收藏
页码:3803 / 3807
页数:5
相关论文
共 50 条
  • [11] An Acoustic Model For English Speech Recognition Based On Deep Learning
    Ling, Zhang
    [J]. 2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 610 - 614
  • [12] Semi-Supervised Training of DNN-Based Acoustic Model for ATC Speech Recognition
    Smidl, Lubos
    Svec, Jan
    Prazak, Ales
    Trmal, Jan
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 646 - 655
  • [13] Underwater Acoustic Multi-target Recognition Algorithm Based on Hierarchical Information Fusion Structure
    Yu, Liang
    Cheng, Yong-mei
    Song, Lin
    Liu, Zhun-ga
    Chen, Ke-zhe
    [J]. 2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [14] A Novel Jointly Optimized Cooperative DAE-DNN Approach Based on a New Multi-Target Step-Wise Learning for Speech Enhancement
    Pashaian, Matin
    Seyedin, Sanaz
    Ahadi, Seyed Mohammad
    [J]. IEEE ACCESS, 2023, 11 : 21669 - 21685
  • [15] Few-shot learning for joint model in underwater acoustic target recognition
    Shengzhao Tian
    Di Bai
    Junlin Zhou
    Yan Fu
    Duanbing Chen
    [J]. Scientific Reports, 13
  • [16] Few-shot learning for joint model in underwater acoustic target recognition
    Tian, Shengzhao
    Bai, Di
    Zhou, Junlin
    Fu, Yan
    Chen, Duanbing
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [17] Mixed-Bandwidth Cross-Channel Speech Recognition via Joint Optimization of DNN-Based Bandwidth Expansion and Acoustic Modeling
    Gao, Jianqing
    Du, Jun
    Chen, Enhong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 559 - 571
  • [18] A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement
    Tu, Yan-Hui
    Du, Jun
    Gao, Tian
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1608 - 1619
  • [19] Reinforcement learning-based waveform optimization for MIMO multi-target detection
    Wang, Li
    Fortunati, Stefano
    Greco, Maria S.
    Gini, Fulvio
    [J]. 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 1329 - 1333
  • [20] Multi-target objects and complex color recognition model based on humanoid robot
    Juang, Li-Hong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (07) : 9645 - 9669