DEEPFILTERNET: A LOW COMPLEXITY SPEECH ENHANCEMENT FRAMEWORK FOR FULL-BAND AUDIO BASED ON DEEP FILTERING

被引:18
|
作者
Schroeter, Hendrik [1 ]
Escalante-B, Alberto N. [2 ]
Rosenkranz, Tobias [2 ]
Maier, Andreas [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Erlangen, Germany
[2] WS Audiol, Res & Dev, Erlangen, Germany
关键词
deep filtering; speech enhancement;
D O I
10.1109/ICASSP43922.2022.9747055
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Complex-valued processing has brought deep learning-based speech enhancement and signal extraction to a new level. Typically, the process is based on a time-frequency (TF) mask which is applied to a noisy spectrogram, while complex masks (CM) are usually preferred over real-valued masks due to their ability to modify the phase. Recent work proposed to use a complex filter instead of a point-wise multiplication with a mask. This allows to incorporate information from previous and future time steps exploiting local correlations within each frequency band. In this work, we propose DeepFilterNet, a two stage speech enhancement framework utilizing deep filtering. First, we enhance the spectral envelope using ERB-scaled gains modeling the human frequency perception. The second stage employs deep filtering to enhance the periodic components of speech. Additionally to taking advantage of perceptual properties of speech, we enforce network sparsity via separable convolutions and extensive grouping in linear and recurrent layers to design a low complexity architecture. We further show that our two stage deep filtering approach outperforms complex masks over a variety of frequency resolutions and latencies and demonstrate convincing performance compared to other state-of-the-art models.
引用
收藏
页码:7407 / 7411
页数:5
相关论文
共 50 条
  • [1] DEEPFILTERNET2: TOWARDS REAL-TIME SPEECH ENHANCEMENT ON EMBEDDED DEVICES FOR FULL-BAND AUDIO
    Schroeter, H.
    Maier, A.
    Escalante-B, A. N.
    Rosenkranz, T.
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [2] Local spectral attention for full-band speech enhancement
    Hou, Zhongshu
    Hu, Qinwen
    Chen, Kai
    Cao, Zhanzhong
    Lu, Jing
    [J]. JASA EXPRESS LETTERS, 2023, 3 (11):
  • [3] A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement
    Valin, Jean-Marc
    [J]. 2018 IEEE 20TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2018,
  • [4] On the Use of Absolute Threshold of Hearing-based Loss for Full-band Speech Enhancement
    Mars, Rohith
    Das, Rohan Kumar
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 458 - 462
  • [5] Learnable spectral dimension compression mapping for full-band speech enhancement
    Hu, Qinwen
    Hou, Zhongshu
    Chen, Kai
    Lu, Jing
    [J]. JASA EXPRESS LETTERS, 2023, 3 (02):
  • [6] Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement
    Chen, Zhuangqi
    Zhang, Pingjian
    [J]. INTERSPEECH 2022, 2022, : 921 - 925
  • [7] Improving low-complexity and real-time DeepFilterNet2 for personalized speech enhancement
    Shanghai Engineering Research Center of Intelligent Education and Bigdata, Shanghai Normal University, Shanghai
    200234, China
    不详
    [J]. Int J Speech Technol, 2024, 2 (299-306): : 299 - 306
  • [8] Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Full-Band Speech Enhancement
    Yu, Guochen
    Li, Andong
    Liu, Wenzhe
    Zheng, Chengshi
    Wang, Yutian
    Wang, Hui
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 483 - 487
  • [9] DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT
    Dang, Feng
    Chen, Hangting
    Zhangt, Pengyuan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6857 - 6861
  • [10] MULTI-CHANNEL NARROW-BAND DEEP SPEECH SEPARATION WITH FULL-BAND PERMUTATION INVARIANT TRAINING
    Quan, Changsheng
    Li, Xiaofei
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 541 - 545