Joint Time-Frequency and Time Domain Learning for Speech Enhancement

被引:0
|
作者
Tang, Chuanxin [1 ]
Luo, Chong [1 ]
Zhao, Zhiyuan [1 ]
Xie, Wenxuan [1 ]
Zeng, Wenjun [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For single-channel speech enhancement, both time-domain and time-frequency-domain methods have their respective pros and cons. In this paper, we present a cross-domain framework named TFT-Net, which takes time-frequency spectrogram as input and produces time-domain waveform as output. Such a framework takes advantage of the knowledge we have about spectrogram and avoids some of the drawbacks that T-F-domain methods have been suffering from. In TFT-Net, we design an innovative dual-path attention block (DAB) to fully exploit correlations along the time and frequency axes. We further discover that a sample-independent DAB (SDAB) achieves a good trade-off between enhanced speech quality and complexity. Ablation studies show that both the cross-domain design and the SDAB block bring large performance gain. When logarithmic MSE is used as the training criteria, TFT-Net achieves the highest SDR and SSNR among state-of-the-art methods on two major speech enhancement benchmarks.
引用
收藏
页码:3816 / 3822
页数:7
相关论文
共 50 条
  • [1] Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis
    Zhang, Wenbo
    Xie, Xuefeng
    Du, Yanling
    Huang, Dongmei
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 155 (06): : 3580 - 3588
  • [2] Neural speech enhancement in the time-frequency domain
    Volkmer, M
    [J]. 2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 617 - 626
  • [3] SPEECH ENHANCEMENT BASED ON JOINT TIME-FREQUENCY SEGMENTATION
    Tantibundhit, C.
    Pernkopf, F.
    Kubin, G.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4673 - +
  • [4] Integrated speech enhancement and coding in the time-frequency domain
    Drygajlo, A
    Carnero, B
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1183 - 1186
  • [5] Joint Time-Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement
    Tantibundhit, Charturong
    Pernkopf, Franz
    Kubin, Gernot
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1417 - 1428
  • [6] Residual Unet with Attention Mechanism for Time-Frequency Domain Speech Enhancement
    Chen, Hanyu
    Peng, Xiwei
    Jiang, Qiqi
    Guo, Yujie
    [J]. 2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 7007 - 7011
  • [7] Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain
    Oostermeijer, Koen
    Wang, Qing
    Du, Jun
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 465 - 470
  • [8] Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones
    Wang, Lin
    Cavallaro, Andrea
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (06): : 871 - 881
  • [9] Single channel speech enhancement via time-frequency dictionary learning
    Huang, Jianjun
    Zhang, Xiongwei
    Zhang, Yafei
    Zou, Xia
    [J]. Shengxue Xuebao/Acta Acustica, 2012, 37 (05): : 539 - 547
  • [10] TIME-FREQUENCY ATTENTION FOR MONAURAL SPEECH ENHANCEMENT
    Zhang, Qiquan
    Song, Qi
    Ni, Zhaoheng
    Nicolson, Aaron
    Li, Haizhou
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7852 - 7856