Speech Enhancement using Convolution Neural Network-based Spectrogram Denoising

被引：0

作者：

Hu Xuhong ^{[1
]}

Yan Lin-Huang ^{[2
]}

Lu Xun ^{[3
]}

Guan Yuan-Sheng ^{[2
]}

Hu Wenlin ^{[1
]}

Wang Jie ^{[2
,4
]}

机构：

[1] China Railway Design Corp, Natl Engn Lab Digital Construct & Evaluat Urban R, Tianjin, Peoples R China

[2] Guangzhou Univ, Sch Elect & Commun Engn, Guangzhou, Guangdong, Peoples R China

[3] Guangdong Power Grid Co, Power Grid Planning Ctr, Guangzhou, Guangdong, Peoples R China

[4] Ctr Rd Traff Noise Control, Natl Environm Protect Engn & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF 2021 7TH INTERNATIONAL CONFERENCE ON CONDITION MONITORING OF MACHINERY IN NON-STATIONARY OPERATIONS (CMMNO) | 2021年

关键词：

Speech enhancement; deep learning; convolution neural network; spectrogram denoising; NOISE; EFFICIENT;

D O I：

10.1109/CMMNO53328.2021.9467599

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

Regarding spectrogram as an image, this paper adopts a convolution neural network (CNN)-based image enhancement algorithm for spectrogram denoising. By doing so, speech denoising can be achieved when the spectrogram is enhanced by the proposed CNN-based image enhancement algorithm. The spectrogram clipping strategy was presented to obtain a large amount of training data, which gave rise to a smaller storage cost and avoided the limited depth development and problem of excessive complexity commonly presented in traditional speech features when training a recurrent neural network. Meanwhile, a deeper network was constructed to improve the capacity and flexibility to use the features of the spectrogram better, and it can also capture enough spatial information to make the noise reduction performance effectively. In addition, the proposed model utilized residual learning strategy in CNN training, with the combination of batch normalization, which greatly improved the performance of the model. The experimental results demonstrates that the proposed spectrogram denoising model has better learning ability and denoising performance, whether it is a known noise situation or a noise mismatch situation, so that the proposed system shows robust speech enhancement effect.

引用

页码：310 / 318

页数：9

共 50 条

[1] Enhancement of Coded Speech Using Neural Network-Based Side Information
Hwang, Soojoong
Cheon, Youngju
Han, Sangwook
Jang, Inseon
Shin, Jong Won
IEEE ACCESS, 2021, 9 : 121532 - 121540
[2] Integrating Uncertainty Into Neural Network-Based Speech Enhancement
Fang, Huajian
Becker, Dennis
Wermter, Stefan
Gerkmann, Timo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1587 - 1600
[3] Effect of spectrogram resolution on deep-neural-network-based speech enhancement
Takeuchi, Daiki
Yatabe, Kohei
Koizumi, Yuma
Oikawa, Yasuhiro
Harada, Noboru
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (05) : 769 - 775
[4] Deep Convolutional Neural Network-based Speech Signal Enhancement Using Extensive Speech Features
Garg, Anil
Sahu, O. P.
INTERNATIONAL JOURNAL OF COMPUTATIONAL METHODS, 2022, 19 (08)
[5] Investigating Modulation Spectrogram Features for Deep Neural Network-based Automatic Speech Recognition
Baby, Deepak
Van Hamme, Hugo
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2479 - 2483
[6] INTEGRATING STATISTICAL UNCERTAINTY INTO NEURAL NETWORK-BASED SPEECH ENHANCEMENT
Fang, Huajian
Peer, Tal
Wermter, Stefan
Gerkmann, Timo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 386 - 390
[7] Subjective intelligibility of deep neural network-based speech enhancement
Gelderblom, Femke B.
Tronstad, Tron V.
Viggen, Erlend Magnus
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1968 - 1972
[8] A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT
Ouyangi, Zhiheng
Yu, Hongjiang
Zhu, Wei-Ping
Champagne, Benoit
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5756 - 5760
[9] Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients
Mamun, Nursadul
Khorram, Soheil
Hansen, John H. L.
INTERSPEECH 2019, 2019, : 4265 - 4269
[10] A STUDY OF TRAINING TARGETS FOR DEEP NEURAL NETWORK-BASED SPEECH ENHANCEMENT USING NOISE PREDICTION
Odelowo, Babafemi O.
Anderson, David V.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5409 - 5413

← 1 2 3 4 5 →