FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT

被引:77
|
作者
Hao, Xiang [1 ,2 ,3 ]
Su, Xiangdong [3 ]
Horaud, Radu [4 ]
Li, Xiaofei [1 ,2 ]
机构
[1] Westlake Univ, Hangzhou, Peoples R China
[2] Westlake Inst Adv Study, Hangzhou, Peoples R China
[3] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Peoples R China
[4] Inria Grenoble Rhone Alpes, Montbonnot St Martin, France
关键词
FullSubNet; Full-band and Sub-band Fusion; Sub-band; Speech Enhancement;
D O I
10.1109/ICASSP39728.2021.9414177
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance cross-band dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).
引用
收藏
页码:6633 / 6637
页数:5
相关论文
共 50 条
  • [21] Learnable spectral dimension compression mapping for full-band speech enhancement
    Hu, Qinwen
    Hou, Zhongshu
    Chen, Kai
    Lu, Jing
    [J]. JASA EXPRESS LETTERS, 2023, 3 (02):
  • [22] Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU
    Matsubara, Keisuke
    Okamoto, Takuma
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    [J]. IEEE ACCESS, 2021, 9 : 94923 - 94933
  • [23] Real-time, full-band, online DNN-based voice conversion system using a single CPU
    Saeki, Takaaki
    Saito, Yuki
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2020, 2020, : 1021 - 1022
  • [24] PERFORMANCE COMPARISON OF REAL-TIME SINGLE-CHANNEL SPEECH DEREVERBERATION ALGORITHMS
    Xiong, Feifei
    Meyer, Bernd T.
    Cauchi, Benjamin
    Jukic, Ante
    Doclo, Simon
    Goetze, Stefan
    [J]. 2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 126 - 130
  • [25] Real-time single-channel deep neural network-based speech enhancement on edge devices
    Shankar, Nikhil
    Bhat, Gautam Shreedhar
    Panahi, Issa M. S.
    [J]. INTERSPEECH 2020, 2020, : 3281 - 3285
  • [26] Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model
    Degottex, Gilles
    Stylianou, Yannis
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2085 - 2095
  • [27] Speech Enhancement using Sub-band Wiener Filter with Pitch Synchronous Analysis
    Sunnydayal, V.
    Kumar, T. Kishore
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 20 - 25
  • [28] The effect of modified filter distribution on an adaptive, sub-band speech enhancement method
    Darlington, DJ
    Campbell, DR
    [J]. 1996 IEEE DIGITAL SIGNAL PROCESSING WORKSHOP, PROCEEDINGS, 1996, : 153 - 156
  • [29] Binaural sub-band adaptive speech enhancement using artificial neural networks
    Hussain, A
    Campbell, DR
    [J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 177 - 186
  • [30] Speech enhancement in functional MRI environment using adaptive sub-band algorithms
    Ramachandran, Venkat R.
    Kannan, Govind
    Milani, Ali A.
    Panahi, Issa M. S.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 341 - +