FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT

被引:77
|
作者
Hao, Xiang [1 ,2 ,3 ]
Su, Xiangdong [3 ]
Horaud, Radu [4 ]
Li, Xiaofei [1 ,2 ]
机构
[1] Westlake Univ, Hangzhou, Peoples R China
[2] Westlake Inst Adv Study, Hangzhou, Peoples R China
[3] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
[4] Inria Grenoble Rhone Alpes, Montbonnot St Martin, France
关键词
FullSubNet; Full-band and Sub-band Fusion; Sub-band; Speech Enhancement;
D O I
10.1109/ICASSP39728.2021.9414177
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance cross-band dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).
引用
收藏
页码:6633 / 6637
页数:5
相关论文
共 50 条
  • [31] Single-channel speech enhancement based on multi-band spectrogram-rearranged RPCA
    Luo, Yongjiang
    Mao, Yu
    [J]. ELECTRONICS LETTERS, 2019, 55 (07) : 415 - +
  • [32] MULTI-CHANNEL NARROW-BAND DEEP SPEECH SEPARATION WITH FULL-BAND PERMUTATION INVARIANT TRAINING
    Quan, Changsheng
    Li, Xiaofei
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 541 - 545
  • [33] Enhancement of Noisy Speech using Sub-band Harmonic Regeneration and Speech Presence Uncertainty Estimator
    Kumar, Ravi
    Subbaiah, P. V.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 456 - 460
  • [34] A probabilistic union model for sub-band based robust speech recognition
    Ming, J
    Smith, FJ
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1787 - 1790
  • [35] ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION
    Han, Cong
    Luo, Yi
    Mesgarani, Nima
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 361 - 365
  • [36] On the Use of Absolute Threshold of Hearing-based Loss for Full-band Speech Enhancement
    Mars, Rohith
    Das, Rohan Kumar
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 458 - 462
  • [37] Sub-band Digital Predistortion for Noncontiguous Transmissions: Algorithm Development and Real-Time Prototype Implementation
    Abdelaziz, Mahmoud
    Tarver, Chance
    Li, Kaipeng
    Anttila, Lauri
    Martinez, Raul
    Valkama, Mikko
    Cavallaro, Joseph R.
    [J]. 2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 1180 - 1186
  • [38] Calibration of a UWB Sub-Band Channel Model Using Simulated Annealing
    Jemai, Jaouhar
    Eggers, Patrick C. F.
    Pedersen, Gert Frolund
    Kuerner, Thomas
    [J]. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2009, 57 (10) : 3439 - 3443
  • [39] TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
    Wang, Zhong-Qiu
    Cornell, Samuele
    Choi, Shukjae
    Lee, Younglo
    Kim, Byeong-Yeol
    Watanabe, Shinji
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3221 - 3236
  • [40] Speech enhancement using sub-band adaptive Griffiths-Jim signal processing
    Campbell, DR
    Shields, PW
    [J]. SPEECH COMMUNICATION, 2003, 39 (1-2) : 97 - 110