DENSELY CONNECTED MULTI-STAGE MODEL WITH CHANNEL WISE SUBBAND FEATURE FOR REAL-TIME SPEECH ENHANCEMENT

被引:7
|
作者
Li, Jingdong [1 ]
Luo, Dawei [1 ]
Liu, Yun [1 ]
Zhu, Yuanyuan [1 ]
Li, Zhaoxia [1 ]
Cui, Guohui [1 ]
Tang, Wenqi [1 ]
Chen, Wei [1 ]
机构
[1] Sogou Inc, AI Interact Div, Beijing, Peoples R China
关键词
speech enhancement; noise suppression; speech perceptual quality; supervised learning;
D O I
10.1109/ICASSP39728.2021.9413967
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Research on single channel speech enhancement (SE) has a long tradition, but two main practical problems still remain unsolved. Firstly, it's hard to balance between enhancement quality and computational efficiency, and low-latency always brings loss of quality. Secondly, enhancement in specific scenarios, such as singing and emotional speech, is also an intricate problem of conventional methods. In this paper, we propose a computationally efficient real-time speech enhancement network with densely connected multi-stage structures, which progressively enhances the channel-wise subband speech. The enhanced speech from earlier stage is used to guide the processing of deeper stage in order to obtain coarse to fine estimations. Besides, supervision is applied to all intermediate results in order to stabilize training and accelerate convergence. Moreover, an adaptive fine-tune step is utilized with some small datasets of specific scenarios, which achieves superb improvement under corresponding scenes. As a result, the proposed method achieves promising performance improvements in terms of speech quality and demonstrates robustness in complex scenarios. We submitt the proposed method to the deep noise suppression (DNS) challenge 2021, real-time denoising track, which was held by Microsoft. In the subjective evaluation, our system outperforms DNS-Challenge baseline by 0.14 points in terms of mean opinion score (MOS).
引用
收藏
页码:6638 / 6642
页数:5
相关论文
共 50 条
  • [31] Verification of Real-Time Optimization for Multi-stage Spray Dryer Operation with Polynomial Optimization
    Miklos, Robert
    Petersen, Lars Norbert
    Poulsen, Niels Kjolstad
    Utzen, Christer
    Jorgensen, John Bagterp
    Niemann, Hans Henrik
    [J]. 2018 IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (CCTA), 2018, : 502 - 507
  • [32] A time-frequency fusion model for multi-channel speech enhancement
    Zeng, Xiao
    Xu, Shiyun
    Wang, Mingjiang
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [33] Decomposition-based real-time control of multi-stage transfer lines with residence time constraints
    Wang, Feifan
    Ju, Feng
    [J]. IISE TRANSACTIONS, 2021, 53 (09) : 943 - 959
  • [34] MFENet: Multi-level feature enhancement network for real-time semantic segmentation
    Zhang, Boxiang
    Li, Wenhui
    Hui, Yuming
    Liu, Jiayun
    Guan, Yuanyuan
    [J]. NEUROCOMPUTING, 2020, 393 : 54 - 65
  • [35] Investigation of the scavenging process in two-stroke uniflow scavenging marine engines by a real-time multi-stage model
    Liu, Dai
    Han, Xiao
    Liu, Long
    Ma, Xiuzhen
    [J]. FRONTIERS IN ENERGY RESEARCH, 2022, 10
  • [36] Multi-stage residual life prediction of aero-engine based on real-time clustering and combined prediction model
    Liu, Junqiang
    Yu, Zhuoqian
    Zuo, Hongfu
    Fu, Rongchunxue
    Feng, Xiaonan
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 225
  • [37] A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks
    Zhang-Jin Huang
    Xiang-Xiang He
    Fang-Jun Wang
    Qing Shen
    [J]. Journal of Computer Science and Technology, 2021, 36 : 434 - 444
  • [38] A Real-Time Multi-Stage Architecture for Pose Estimation of Zebrafish Head with Convolutional Neural Networks
    Huang, Zhang-Jin
    He, Xiang-Xiang
    Wang, Fang-Jun
    Shen, Qing
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (02) : 434 - 444
  • [39] A Real-Time Maintenance Policy for Multi-Stage Manufacturig Systems Considering Imperfect Maintenance Effects
    Huang, Jing
    Chang, Qing
    Zou, Jing
    Arinez, Jorge
    [J]. IEEE ACCESS, 2018, 6 : 62174 - 62183
  • [40] The multi-channel AR model for real-time audio restoration
    Lin, H
    Godsill, S
    [J]. 2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 335 - 338