Speech Enhancement using Convolution Neural Network-based Spectrogram Denoising

被引:0
|
作者
Hu Xuhong [1 ]
Yan Lin-Huang [2 ]
Lu Xun [3 ]
Guan Yuan-Sheng [2 ]
Hu Wenlin [1 ]
Wang Jie [2 ,4 ]
机构
[1] China Railway Design Corp, Natl Engn Lab Digital Construct & Evaluat Urban R, Tianjin, Peoples R China
[2] Guangzhou Univ, Sch Elect & Commun Engn, Guangzhou, Guangdong, Peoples R China
[3] Guangdong Power Grid Co, Power Grid Planning Ctr, Guangzhou, Guangdong, Peoples R China
[4] Ctr Rd Traff Noise Control, Natl Environm Protect Engn & Technol, Beijing, Peoples R China
关键词
Speech enhancement; deep learning; convolution neural network; spectrogram denoising; NOISE; EFFICIENT;
D O I
10.1109/CMMNO53328.2021.9467599
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Regarding spectrogram as an image, this paper adopts a convolution neural network (CNN)-based image enhancement algorithm for spectrogram denoising. By doing so, speech denoising can be achieved when the spectrogram is enhanced by the proposed CNN-based image enhancement algorithm. The spectrogram clipping strategy was presented to obtain a large amount of training data, which gave rise to a smaller storage cost and avoided the limited depth development and problem of excessive complexity commonly presented in traditional speech features when training a recurrent neural network. Meanwhile, a deeper network was constructed to improve the capacity and flexibility to use the features of the spectrogram better, and it can also capture enough spatial information to make the noise reduction performance effectively. In addition, the proposed model utilized residual learning strategy in CNN training, with the combination of batch normalization, which greatly improved the performance of the model. The experimental results demonstrates that the proposed spectrogram denoising model has better learning ability and denoising performance, whether it is a known noise situation or a noise mismatch situation, so that the proposed system shows robust speech enhancement effect.
引用
收藏
页码:310 / 318
页数:9
相关论文
共 50 条
  • [1] Enhancement of Coded Speech Using Neural Network-Based Side Information
    Hwang, Soojoong
    Cheon, Youngju
    Han, Sangwook
    Jang, Inseon
    Shin, Jong Won
    IEEE ACCESS, 2021, 9 : 121532 - 121540
  • [2] Integrating Uncertainty Into Neural Network-Based Speech Enhancement
    Fang, Huajian
    Becker, Dennis
    Wermter, Stefan
    Gerkmann, Timo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1587 - 1600
  • [3] Effect of spectrogram resolution on deep-neural-network-based speech enhancement
    Takeuchi, Daiki
    Yatabe, Kohei
    Koizumi, Yuma
    Oikawa, Yasuhiro
    Harada, Noboru
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (05) : 769 - 775
  • [4] Deep Convolutional Neural Network-based Speech Signal Enhancement Using Extensive Speech Features
    Garg, Anil
    Sahu, O. P.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2022, 19 (08)
  • [5] Investigating Modulation Spectrogram Features for Deep Neural Network-based Automatic Speech Recognition
    Baby, Deepak
    Van Hamme, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2479 - 2483
  • [6] INTEGRATING STATISTICAL UNCERTAINTY INTO NEURAL NETWORK-BASED SPEECH ENHANCEMENT
    Fang, Huajian
    Peer, Tal
    Wermter, Stefan
    Gerkmann, Timo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 386 - 390
  • [7] Subjective intelligibility of deep neural network-based speech enhancement
    Gelderblom, Femke B.
    Tronstad, Tron V.
    Viggen, Erlend Magnus
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1968 - 1972
  • [8] A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT
    Ouyangi, Zhiheng
    Yu, Hongjiang
    Zhu, Wei-Ping
    Champagne, Benoit
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5756 - 5760
  • [9] Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients
    Mamun, Nursadul
    Khorram, Soheil
    Hansen, John H. L.
    INTERSPEECH 2019, 2019, : 4265 - 4269
  • [10] A STUDY OF TRAINING TARGETS FOR DEEP NEURAL NETWORK-BASED SPEECH ENHANCEMENT USING NOISE PREDICTION
    Odelowo, Babafemi O.
    Anderson, David V.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5409 - 5413