DENSELY CONNECTED MULTI-STAGE MODEL WITH CHANNEL WISE SUBBAND FEATURE FOR REAL-TIME SPEECH ENHANCEMENT

被引:7
|
作者
Li, Jingdong [1 ]
Luo, Dawei [1 ]
Liu, Yun [1 ]
Zhu, Yuanyuan [1 ]
Li, Zhaoxia [1 ]
Cui, Guohui [1 ]
Tang, Wenqi [1 ]
Chen, Wei [1 ]
机构
[1] Sogou Inc, AI Interact Div, Beijing, Peoples R China
关键词
speech enhancement; noise suppression; speech perceptual quality; supervised learning;
D O I
10.1109/ICASSP39728.2021.9413967
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Research on single channel speech enhancement (SE) has a long tradition, but two main practical problems still remain unsolved. Firstly, it's hard to balance between enhancement quality and computational efficiency, and low-latency always brings loss of quality. Secondly, enhancement in specific scenarios, such as singing and emotional speech, is also an intricate problem of conventional methods. In this paper, we propose a computationally efficient real-time speech enhancement network with densely connected multi-stage structures, which progressively enhances the channel-wise subband speech. The enhanced speech from earlier stage is used to guide the processing of deeper stage in order to obtain coarse to fine estimations. Besides, supervision is applied to all intermediate results in order to stabilize training and accelerate convergence. Moreover, an adaptive fine-tune step is utilized with some small datasets of specific scenarios, which achieves superb improvement under corresponding scenes. As a result, the proposed method achieves promising performance improvements in terms of speech quality and demonstrates robustness in complex scenarios. We submitt the proposed method to the deep noise suppression (DNS) challenge 2021, real-time denoising track, which was held by Microsoft. In the subjective evaluation, our system outperforms DNS-Challenge baseline by 0.14 points in terms of mean opinion score (MOS).
引用
收藏
页码:6638 / 6642
页数:5
相关论文
共 50 条
  • [1] DCT based densely connected convolutional GRU for real-time speech enhancement
    Jannu, Chaitanya
    Vanambathina, Sunny Dayal
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 1195 - 1208
  • [2] DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6629 - 6633
  • [3] Multi-stage temporal representation learning via global and local perspectives for real-time speech enhancement
    Chau, Hoang Ngoc
    Linh, Nguyen Thi Nhat
    Doan, Tuan Kiet
    Nguyen, Quoc Cuong
    [J]. APPLIED ACOUSTICS, 2024, 223
  • [4] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    [J]. INTERSPEECH 2021, 2021, : 1862 - 1866
  • [5] Efficient multi-stage network with pixel-wise degradation prediction for real-time motion deblurring
    Hao, Zeyu
    Wang, Hang
    Zhang, Xuchong
    Li, Yuhai
    Sun, Hongbin
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233
  • [6] A Multi-stage Hierarchical Window Model with Application to Real-Time Graph Analysis
    Jayasekara, Sachini
    Karunasekera, Shanika
    Harwood, Aaron
    [J]. 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 2561 - 2564
  • [7] Two-channel multi-stage speech enhancement for noisy fMRI environment
    Montazeri, Vahid
    Pathak, Nishank
    Panahi, Issa M. S.
    [J]. CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2013, 36 (02): : 60 - 67
  • [8] Multi-Stage Real-Time identification for Data Stream Events With Drift Feature Based on DTW
    Wang, Junlu
    Liu, Chengfeng
    Ding, Linlin
    Luo, Hao
    Song, Baoyan
    [J]. IEEE ACCESS, 2019, 7 : 89188 - 89204
  • [9] Real-time neural speech enhancement based on temporal refinement network and channel-wise gating methods
    Lee, Jinyoung
    Kang, Hong-Goo
    [J]. DIGITAL SIGNAL PROCESSING, 2023, 133
  • [10] Real-time DSP implementation of a subband beamforming algorithm for dual microphone speech enhancement
    Yermeche, Zohra
    Sallberg, Benny
    Grbic, Nedelko
    Claesson, Ingvar
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 353 - 356