Time-frequency masking based supervised speech enhancement framework using fuzzy deep belief network

被引:25
|
作者
Samui, Suman [1 ]
Chakrabarti, Indrajit [2 ]
Ghosh, Soumya K. [3 ]
机构
[1] Indian Inst Technol, Adv Technol Dev Ctr, Kharagpur, W Bengal, India
[2] Indian Inst Technol, Dept Elect & Elect Commun Engn, Kharagpur, W Bengal, India
[3] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur, W Bengal, India
关键词
Speech enhancement; Speech processing; Deep belief network; Deep learning; Restricted Boltzmann machine (RBM); Fuzzy restricted Boltzmann machine (FRBM); Time-frequency masking; NEURAL-NETWORKS; PHASE ESTIMATION; NOISE; INTELLIGIBILITY; SEPARATION; ALGORITHM;
D O I
10.1016/j.asoc.2018.10.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning based supervised speech enhancement methods have gained a considerable amount of research attention over the statistical signal processing based methods. In this study, we have considered the time-frequency masking based deep learning framework for speech enhancement and investigated how the performance of these methods can be improved further. We have mainly established that significant performance improvement can be achieved if the deep neural network (DNN) is pre-trained by using Fuzzy Restricted Boltzmann Machines (FRBM) rather than using regular Restricted Boltzmann Machines (RBM). This is mainly because of the fact that the performance of FRBM is more robust and effective when the training data is noisy. In order to train an FRBM, we have adopted a defuzzification method based on the crisp probabilistic mean value of fuzzy numbers. The detailed theory of training strategy of an FRBM with different fuzzy membership functions such as Symmetric Triangular Fuzzy Numbers (STFN) and Asymmetric Triangular Fuzzy Numbers (ATFN) is presented. Furthermore, we have evaluated the performance of the proposed training strategies on different DNN based Speech Enhancement Systems (SES) which are developed based on different training targets such as Complex Ideal Ratio Mask (cIRM), Ideal Ratio Mask (IRM) and Phase-Sensitive Mask (PSM). Experimental results on various noise scenarios have shown that the DNN-based speech enhancement system trained by the proposed approach ensures a consistent improvement in various objective measure scores of perceived speech quality and intelligibility while compared to the conventional DNN-based speech enhancement methods which use regular RBM for unsupervised pre-training.
引用
收藏
页码:583 / 602
页数:20
相关论文
共 50 条
  • [1] TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK
    Soni, Meet H.
    Shah, Neil
    Patil, Hemant A.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5039 - 5043
  • [2] PHASE TIME-FREQUENCY MASKING BASED SPEECH ENHANCEMENT ALGORITHM USING CIRCULAR MICROPHONE ARRAY
    He, Li
    Zhou, Yi
    Liu, Hongqing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 808 - 813
  • [3] Robust speech separation using time-frequency masking
    Aarabi, P
    Shi, GJ
    Jahromi, O
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
  • [4] On time-frequency masking in voiced speech
    Skoglund, J
    Kleijn, WB
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
  • [5] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
    Guo, Xinyu
    Ou, Shifeng
    Gao, Meng
    Gao, Ying
    [J]. 2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
  • [6] Time-frequency mask estimation-based speech enhancement using deep encoder-decoder neural network
    SHI Wenhua
    ZHANG Xiongwei
    ZOU Xia
    SUN Meng
    LI Li
    REN Zhengbing
    [J]. Chinese Journal of Acoustics, 2021, 40 (01) : 141 - 154
  • [7] A time-frequency smoothing neural network for speech enhancement
    Yuan, Wenhao
    [J]. SPEECH COMMUNICATION, 2020, 124 : 75 - 84
  • [8] MULTICHANNEL SPEECH ENHANCEMENT BASED ON TIME-FREQUENCY MASKING USING SUBBAND LONG SHORT-TERM MEMORY
    Li, Xiaofei
    Horaud, Radu
    [J]. 2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 298 - 302
  • [9] Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network
    Shah, Neil
    Patil, Hemant A.
    Soni, Meet H.
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1246 - 1251
  • [10] Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
    Yang Yu
    Wenwu Wang
    Peng Han
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2016