Deep neural network based speech enhancement using mono channel mask

被引:4
|
作者
Ingale, Pallavi P. [1 ]
Nalbalwar, Sanjay L. [1 ]
机构
[1] Dr Babasaheb Ambedkar Tecnhol Univ, Lonere, India
关键词
Speech enhancement; Mono channel mask; Binary mask; Modified sub-harmonic summation; CLASSIFICATION-BASED APPROACH; NOISE;
D O I
10.1007/s10772-019-09627-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Getting enhanced speech from the noisy speech signal is a task of particular importance in the area of speech processing. Here we propose a deep neural network (DNN) based speech enhancement method utilising mono channel mask. The proposed method employs cochleagram to find an initial binary mask. Then modified sub-harmonic summation algorithm is applied on initial binary mask to obtain an intermediate mask. The spectro-temporal features of this intermediate mask are fed to DNN. DNN finds out the correct spectral structure in the frames associated with the target speech which are further used to develop the mono channel mask. Speech signal is reconstructed using mono channel mask. Mono channel mask avoids the unnecessary interference from the noisy time-frequency (T-F) units. Objective evaluations done using perceptual evaluation of speech quality (PESQ) and normalized source to distortion ratio indicate that the proposed method outperforms the state of the art methods in the area of speech enhancement. Obtained values of PESQ shows that proposed method improves the quality of the speech in noisy conditions. The experimental results present the effectiveness of the mono channel mask in speech enhancement. The proposed method gives better performance compared to other methods.
引用
收藏
页码:841 / 850
页数:10
相关论文
共 50 条
  • [41] Broad Phoneme Class Specific Deep Neural Network Based Speech Enhancement
    Karjol, Pavan
    Ghosh, Prasanta Kumar
    [J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 372 - 376
  • [42] Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments
    Gao, Tian
    Du, Jun
    Xu, Yong
    Liu, Cong
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015, 2015, 9237 : 75 - 82
  • [43] A Deep Neural Network Based Kalman Filter for Time Domain Speech Enhancement
    Yu, Hongjiang
    Ouyang, Zhiheng
    Zhu, Wei-Ping
    Champagne, Benoit
    Ji, Yunyun
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [44] Speech enhancement method based on the perceptual joint optimization deep neural network
    Yuan, Wenhao
    Lou, Yingxi
    Liang, Chunyan
    Wang, Zhiqiang
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (02): : 90 - 94
  • [45] SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
    Gao, Tian
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3713 - 3717
  • [46] A Novel Adversarial Training Scheme for Deep Neural Network based Speech Enhancement
    Cornell, Samuele
    Principi, Emanuele
    Squartini, Stefano
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [47] Improved Sparse NMF based Speech Enhancement Method with Deep Neural Network
    Zou, Xia
    Zhang, Xiongwei
    Shi, Wenhua
    Wang, Fupeng
    Zhang, Jingtao
    Gao, Mingyue
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL FORUM ON MANAGEMENT, EDUCATION AND INFORMATION TECHNOLOGY APPLICATION (IFMEITA 2017), 2017, 130 : 231 - 234
  • [48] Effect of spectrogram resolution on deep-neural-network-based speech enhancement
    Takeuchi, Daiki
    Yatabe, Kohei
    Koizumi, Yuma
    Oikawa, Yasuhiro
    Harada, Noboru
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (05) : 769 - 775
  • [49] GLOBAL VARIANCE EQUALIZATION FOR IMPROVING DEEP NEURAL NETWORK BASED SPEECH ENHANCEMENT
    Xu, Yong
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 71 - 75
  • [50] A STUDY OF TRAINING TARGETS FOR DEEP NEURAL NETWORK-BASED SPEECH ENHANCEMENT USING NOISE PREDICTION
    Odelowo, Babafemi O.
    Anderson, David V.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5409 - 5413