Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

被引:6
|
作者
Li, Ruwei [1 ]
Sun, Xiaoyue [1 ]
Liu, Yanan [1 ]
Yang, Dengcai [1 ]
Dong, Liang [2 ]
机构
[1] Beijing Univ Technol, Sch Informat & Commun Engn, Fac Informat Technol, Beijing Key Lab Computat Intelligence & Intellige, Beijing, Peoples R China
[2] Baylor Univ, Elect & Comp Engn, Waco, TX 76798 USA
基金
中国国家自然科学基金;
关键词
Speech enhancement; Deep neural network; Multi-resolution auditory cepstral coefficient; Adaptive mask; NOISE;
D O I
10.1186/s13634-019-0618-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel speech enhancement algorithm based on multi-feature and adaptive mask with deep learning is presented in this paper. First, we construct a new feature called multi-resolution auditory cepstral coefficient (MRACC). This feature which is extracted from four cochleagrams of different resolutions can capture the local information and spectrotemporal context and reduce the algorithm complexity. Second, an adaptive mask (AM) which can track noise change for speech enhancement is put forward. The AM can flexibly combine the advantages of an ideal binary mask (IBM) and an ideal ratio mask (IRM) with the change of SNR. Third, a deep neural network (DNN) architecture is used as a nonlinear function to estimate adaptive mask. And the first and second derivatives of MRACC and MRACC are used as the input of the DNN. Finally, the estimated AM is used to weight the noisy speech to achieve enhanced speech. Experimental results show that the proposed algorithm not only further improves speech quality and intelligibility, but also suppresses more noise than the contrast algorithms. In addition, the proposed algorithm has a lower complexity than the contrast algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network
    Ruwei Li
    Xiaoyue Sun
    Yanan Liu
    Dengcai Yang
    Liang Dong
    [J]. EURASIP Journal on Advances in Signal Processing, 2019
  • [2] Plastic multi-resolution auditory model based neural network for speech enhancement
    Lai, Chen-Yen
    Lo, Yu-Wen
    Shen, Yih-Liang
    Chi, Tai-Shih
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 605 - 609
  • [3] Speech enhancement based on auditory cepstral coefficient with deep learning
    Li, Ruwei
    Sun, Xiaoyue
    Liu, Yanan
    Li, Tao
    [J]. Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2019, 47 (09): : 78 - 83
  • [4] Speech signal enhancement based on adaptive multi-resolution form of SVD
    Lu Yanhong
    Qin Xiaohong
    [J]. CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 2, PROCEEDINGS, 2008, : 137 - 140
  • [5] Monaural speech enhancement combining accurate ratio mask and deep neural network
    BAI Haojun
    ZHANG Tianqi
    LIU Jianxing
    YE Shaopeng
    [J]. Chinese Journal of Acoustics, 2022, 41 (04) : 373 - 389
  • [6] Deep neural network based speech enhancement using mono channel mask
    Pallavi P. Ingale
    Sanjay L. Nalbalwar
    [J]. International Journal of Speech Technology, 2019, 22 : 841 - 850
  • [7] Deep neural network based speech enhancement using mono channel mask
    Ingale, Pallavi P.
    Nalbalwar, Sanjay L.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 841 - 850
  • [8] Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT
    Toledano, Doroteo T.
    Pilar Fernandez-Gallego, Maria
    Lozano-Diez, Alicia
    [J]. PLOS ONE, 2018, 13 (10):
  • [9] Multi-resolution cepstral features for phoneme recognition across speech sub-bands
    McCourt, P
    Vaseghi, S
    Harte, N
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 557 - 560
  • [10] A dyadic multi-resolution deep convolutional neural wavelet network for image classification
    Ejbali, Ridha
    Zaied, Mourad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (05) : 6149 - 6163