Joint Time-Frequency and Time Domain Learning for Speech Enhancement

被引:0
|
作者
Tang, Chuanxin [1 ]
Luo, Chong [1 ]
Zhao, Zhiyuan [1 ]
Xie, Wenxuan [1 ]
Zeng, Wenjun [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For single-channel speech enhancement, both time-domain and time-frequency-domain methods have their respective pros and cons. In this paper, we present a cross-domain framework named TFT-Net, which takes time-frequency spectrogram as input and produces time-domain waveform as output. Such a framework takes advantage of the knowledge we have about spectrogram and avoids some of the drawbacks that T-F-domain methods have been suffering from. In TFT-Net, we design an innovative dual-path attention block (DAB) to fully exploit correlations along the time and frequency axes. We further discover that a sample-independent DAB (SDAB) achieves a good trade-off between enhanced speech quality and complexity. Ablation studies show that both the cross-domain design and the SDAB block bring large performance gain. When logarithmic MSE is used as the training criteria, TFT-Net achieves the highest SDR and SSNR among state-of-the-art methods on two major speech enhancement benchmarks.
引用
收藏
页码:3816 / 3822
页数:7
相关论文
共 50 条
  • [31] Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising
    Williamson, Donald S.
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1492 - 1501
  • [32] An approach to digital watermarking of speech signals in the time-frequency domain
    Stankovic, Srdjan
    Orovic, Irena
    Zaric, Nikola
    Ioana, Cornel
    [J]. PROCEEDINGS ELMAR-2006, 2006, : 127 - 130
  • [33] HYBRID TIME-FREQUENCY DOMAIN ARTICULATORY SPEECH SYNTHESIZER.
    Sondhi, Man Mohan
    Schroeter, Juergen
    [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, ASSP-35 (07): : 955 - 967
  • [34] Joint Time-Frequency Channel Estimation for Time Domain Synchronous OFDM Systems
    Dai, Linglong
    Wang, Zhaocheng
    Wang, Jun
    Yang, Zhixing
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2013, 59 (01) : 168 - 173
  • [35] Speech endpoint detection based on speech time-frequency enhancement and spectral entropy
    Fan Yingle
    Li Yi
    Wu Chuanyan
    [J]. 2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 4682 - 4684
  • [36] Intelligibility evaluation of enhanced whisper in joint time-frequency domain
    [J]. Zhao, Li, 1600, Southeast University (30):
  • [37] Emotion Recognition Based on Data Enhancement in Time-Frequency Domain
    Li, Qianqian
    Ren, Fuji
    Shen, Xiaoyan
    Kang, Xin
    [J]. INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS 2020, 2020, 11574
  • [38] AN ADAPTIVE TIME-FREQUENCY ANALYSIS SCHEME FOR IMPROVED REAL-TIME SPEECH ENHANCEMENT
    Andersen, Kristian Timm
    Moonen, Marc
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [39] LFM radar signal detection in the joint time-frequency domain
    Grishin, Yury
    Niczyporuk, Wojciech
    [J]. PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2007, PTS 1 AND 2, 2007, 6937
  • [40] Joint time-frequency domain identification of nonlinearly controlled structures
    Jin, Gang
    Sain, Michael K.
    Spencer, Billie F., Jr.
    Pham, Khanh D.
    [J]. MODELING, SIMULATION, AND VERIFICATION OF SPACE-BASED SYSTEMS III, 2006, 6221