Joint Time-Frequency and Time Domain Learning for Speech Enhancement

被引：0

作者：

Tang, Chuanxin ^{[1
]}

Luo, Chong ^{[1
]}

Zhao, Zhiyuan ^{[1
]}

Xie, Wenxuan ^{[1
]}

Zeng, Wenjun ^{[1
]}

机构：

[1] Microsoft Res Asia, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For single-channel speech enhancement, both time-domain and time-frequency-domain methods have their respective pros and cons. In this paper, we present a cross-domain framework named TFT-Net, which takes time-frequency spectrogram as input and produces time-domain waveform as output. Such a framework takes advantage of the knowledge we have about spectrogram and avoids some of the drawbacks that T-F-domain methods have been suffering from. In TFT-Net, we design an innovative dual-path attention block (DAB) to fully exploit correlations along the time and frequency axes. We further discover that a sample-independent DAB (SDAB) achieves a good trade-off between enhanced speech quality and complexity. Ablation studies show that both the cross-domain design and the SDAB block bring large performance gain. When logarithmic MSE is used as the training criteria, TFT-Net achieves the highest SDR and SSNR among state-of-the-art methods on two major speech enhancement benchmarks.

引用

页码：3816 / 3822

页数：7

共 50 条

[31] Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising
Williamson, Donald S.
Wang, DeLiang
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1492 - 1501
[32] An approach to digital watermarking of speech signals in the time-frequency domain
Stankovic, Srdjan
Orovic, Irena
Zaric, Nikola
Ioana, Cornel
[J]. PROCEEDINGS ELMAR-2006, 2006, : 127 - 130
[33] HYBRID TIME-FREQUENCY DOMAIN ARTICULATORY SPEECH SYNTHESIZER.
Sondhi, Man Mohan
Schroeter, Juergen
[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, ASSP-35 (07): : 955 - 967
[34] Joint Time-Frequency Channel Estimation for Time Domain Synchronous OFDM Systems
Dai, Linglong
Wang, Zhaocheng
Wang, Jun
Yang, Zhixing
[J]. IEEE TRANSACTIONS ON BROADCASTING, 2013, 59 (01) : 168 - 173
[35] Speech endpoint detection based on speech time-frequency enhancement and spectral entropy
Fan Yingle
Li Yi
Wu Chuanyan
[J]. 2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 4682 - 4684
[36] Intelligibility evaluation of enhanced whisper in joint time-frequency domain
[J]. Zhao, Li, 1600, Southeast University (30):
[37] Emotion Recognition Based on Data Enhancement in Time-Frequency Domain
Li, Qianqian
Ren, Fuji
Shen, Xiaoyan
Kang, Xin
[J]. INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS 2020, 2020, 11574
[38] AN ADAPTIVE TIME-FREQUENCY ANALYSIS SCHEME FOR IMPROVED REAL-TIME SPEECH ENHANCEMENT
Andersen, Kristian Timm
Moonen, Marc
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[39] LFM radar signal detection in the joint time-frequency domain
Grishin, Yury
Niczyporuk, Wojciech
[J]. PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2007, PTS 1 AND 2, 2007, 6937
[40] Joint time-frequency domain identification of nonlinearly controlled structures
Jin, Gang
Sain, Michael K.
Spencer, Billie F., Jr.
Pham, Khanh D.
[J]. MODELING, SIMULATION, AND VERIFICATION OF SPACE-BASED SYSTEMS III, 2006, 6221

← 1 2 3 4 5 →