DENSELY CONNECTED MULTI-STAGE MODEL WITH CHANNEL WISE SUBBAND FEATURE FOR REAL-TIME SPEECH ENHANCEMENT

被引：7

作者：

Li, Jingdong ^{[1
]}

Luo, Dawei ^{[1
]}

Liu, Yun ^{[1
]}

Zhu, Yuanyuan ^{[1
]}

Li, Zhaoxia ^{[1
]}

Cui, Guohui ^{[1
]}

Tang, Wenqi ^{[1
]}

Chen, Wei ^{[1
]}

机构：

[1] Sogou Inc, AI Interact Div, Beijing, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

speech enhancement; noise suppression; speech perceptual quality; supervised learning;

D O I：

10.1109/ICASSP39728.2021.9413967

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Research on single channel speech enhancement (SE) has a long tradition, but two main practical problems still remain unsolved. Firstly, it's hard to balance between enhancement quality and computational efficiency, and low-latency always brings loss of quality. Secondly, enhancement in specific scenarios, such as singing and emotional speech, is also an intricate problem of conventional methods. In this paper, we propose a computationally efficient real-time speech enhancement network with densely connected multi-stage structures, which progressively enhances the channel-wise subband speech. The enhanced speech from earlier stage is used to guide the processing of deeper stage in order to obtain coarse to fine estimations. Besides, supervision is applied to all intermediate results in order to stabilize training and accelerate convergence. Moreover, an adaptive fine-tune step is utilized with some small datasets of specific scenarios, which achieves superb improvement under corresponding scenes. As a result, the proposed method achieves promising performance improvements in terms of speech quality and demonstrates robustness in complex scenarios. We submitt the proposed method to the deep noise suppression (DNS) challenge 2021, real-time denoising track, which was held by Microsoft. In the subjective evaluation, our system outperforms DNS-Challenge baseline by 0.14 points in terms of mean opinion score (MOS).

引用

页码：6638 / 6642

页数：5

共 50 条

[1] DCT based densely connected convolutional GRU for real-time speech enhancement
Jannu, Chaitanya
Vanambathina, Sunny Dayal
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 1195 - 1208
[2] DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
Pandey, Ashutosh
Wang, DeLiang
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6629 - 6633
[3] Multi-stage temporal representation learning via global and local perspectives for real-time speech enhancement
Chau, Hoang Ngoc
Linh, Nguyen Thi Nhat
Doan, Tuan Kiet
Nguyen, Quoc Cuong
[J]. APPLIED ACOUSTICS, 2024, 223
[4] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
Xue, Cheng
Huang, Weilong
Chen, Weiguang
Feng, Jinwei
[J]. INTERSPEECH 2021, 2021, : 1862 - 1866
[5] Efficient multi-stage network with pixel-wise degradation prediction for real-time motion deblurring
Hao, Zeyu
Wang, Hang
Zhang, Xuchong
Li, Yuhai
Sun, Hongbin
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
[6] A Multi-stage Hierarchical Window Model with Application to Real-Time Graph Analysis
Jayasekara, Sachini
Karunasekera, Shanika
Harwood, Aaron
[J]. 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 2561 - 2564
[7] Two-channel multi-stage speech enhancement for noisy fMRI environment
Montazeri, Vahid
Pathak, Nishank
Panahi, Issa M. S.
[J]. CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2013, 36 (02): : 60 - 67
[8] Multi-Stage Real-Time identification for Data Stream Events With Drift Feature Based on DTW
Wang, Junlu
Liu, Chengfeng
Ding, Linlin
Luo, Hao
Song, Baoyan
[J]. IEEE ACCESS, 2019, 7 : 89188 - 89204
[9] Real-time neural speech enhancement based on temporal refinement network and channel-wise gating methods
Lee, Jinyoung
Kang, Hong-Goo
[J]. DIGITAL SIGNAL PROCESSING, 2023, 133
[10] Real-time DSP implementation of a subband beamforming algorithm for dual microphone speech enhancement
Yermeche, Zohra
Sallberg, Benny
Grbic, Nedelko
Claesson, Ingvar
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 353 - 356

← 1 2 3 4 5 →