Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

被引:6
|
作者
Li, Ruwei [1 ]
Sun, Xiaoyue [1 ]
Liu, Yanan [1 ]
Yang, Dengcai [1 ]
Dong, Liang [2 ]
机构
[1] Beijing Univ Technol, Sch Informat & Commun Engn, Fac Informat Technol, Beijing Key Lab Computat Intelligence & Intellige, Beijing, Peoples R China
[2] Baylor Univ, Elect & Comp Engn, Waco, TX 76798 USA
基金
中国国家自然科学基金;
关键词
Speech enhancement; Deep neural network; Multi-resolution auditory cepstral coefficient; Adaptive mask; NOISE;
D O I
10.1186/s13634-019-0618-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel speech enhancement algorithm based on multi-feature and adaptive mask with deep learning is presented in this paper. First, we construct a new feature called multi-resolution auditory cepstral coefficient (MRACC). This feature which is extracted from four cochleagrams of different resolutions can capture the local information and spectrotemporal context and reduce the algorithm complexity. Second, an adaptive mask (AM) which can track noise change for speech enhancement is put forward. The AM can flexibly combine the advantages of an ideal binary mask (IBM) and an ideal ratio mask (IRM) with the change of SNR. Third, a deep neural network (DNN) architecture is used as a nonlinear function to estimate adaptive mask. And the first and second derivatives of MRACC and MRACC are used as the input of the DNN. Finally, the estimated AM is used to weight the noisy speech to achieve enhanced speech. Experimental results show that the proposed algorithm not only further improves speech quality and intelligibility, but also suppresses more noise than the contrast algorithms. In addition, the proposed algorithm has a lower complexity than the contrast algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [11] A dyadic multi-resolution deep convolutional neural wavelet network for image classification
    Ridha Ejbali
    Mourad Zaied
    [J]. Multimedia Tools and Applications, 2018, 77 : 6149 - 6163
  • [12] Research for multi-resolution wavelet neural network
    Han, FQ
    Gao, YH
    Ma, L
    Li, YH
    Li, JP
    [J]. Wavelet Analysis and Active Media Technology Vols 1-3, 2005, : 1095 - 1100
  • [13] On the training of a multi-resolution CMAC neural network
    Menozzi, A
    Chow, MY
    [J]. IECON '97 - PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS, CONTROL, AND INSTRUMENTATION, VOLS. 1-4, 1997, : 1130 - 1135
  • [14] On the training of a multi-resolution CMAC neural network
    Menozzi, A
    Chow, MY
    [J]. ISIE '97 - PROCEEDINGS OF THE IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-3, 1997, : 1201 - 1205
  • [15] Multi-Resolution Spectral Input for Convolutional Neural Network-Based Speech Recognition
    Toth, Laszlo
    [J]. 2017 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2017,
  • [16] ACOUSTIC MODELING OF SPEECH WAVEFORM BASED ON MULTI-RESOLUTION, NEURAL NETWORK SIGNAL PROCESSING
    Tueske, Zoltan
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4859 - 4863
  • [17] Speech Intelligibility Enhancement Algorithm Based on Multi-Resolution Power-Normalized Cepstral Coefficients (MRPNCC) for Digital Hearing Aids
    Wang, Xia
    Deng, Xing
    Shen, Hongming
    Zhang, Guodong
    Zhang, Shibing
    [J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2021, 126 (02): : 693 - 710
  • [18] A Multi-Resolution Approach to GAN-Based Speech Enhancement
    Kim, Hyung Yong
    Yoon, Ji Won
    Cheon, Sung Jun
    Kang, Woo Hyun
    Kim, Nam Soo
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 15
  • [19] Effect of spectrogram resolution on deep-neural-network-based speech enhancement
    Takeuchi, Daiki
    Yatabe, Kohei
    Koizumi, Yuma
    Oikawa, Yasuhiro
    Harada, Noboru
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (05) : 769 - 775
  • [20] Multi-objective Learning and Mask-based Post-processing for Deep Neural Network based Speech Enhancement
    Xu, Yong
    Du, Jun
    Huang, Zhen
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1508 - 1512