A Study on Sampling of STFT Modifications in Time and Frequency Domains for DNN-Based Speech Dereverberation

被引：0

作者：

Wu, Bo ^{[1
]}

Li, Kehuang ^{[2
]}

Yang, Minglei ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Xidian Univ, Natl Lab Radar Signal Proc, Xian, Peoples R China

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2016年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We investigate the effects of time and frequency sampling on short-time Fourier transform modifications to be used for speech dereverberation based on deep neural networks (DNNs). We first show that by adopting a linear activation function at the output layer and globally normalizing the target features into zero mean and unit variance, better performances can be obtained than existing DNN approaches. Then we show that the quality of dereverberated speech could be degraded with denser sampling in time for longer reverberation times, even at the price of increased computational complexities, requiring an adaptive time sampling strategy. On the other hand, the difference between the unwrapped phases of reverberant and anechoic speech becomes negligible with a dense sampling in frequency, implying a reduced speech distortion. Therefore, there is a great potential to enhance DNN based acoustic signal processing if the conventional sampling strategy can be carefully adjusted.

引用

页数：4

共 50 条

[1] DNN-Based Linear Prediction Residual Enhancement for Speech Dereverberation
Feng, Xinyang
Li, Nuo
He, Zunwen
Zhang, Yan
Zhang, Wancheng
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 541 - 545
[2] INVERTIBLE DNN-BASED NONLINEAR TIME-FREQUENCY TRANSFORM FOR SPEECH ENHANCEMENT
Lakeuchi, Daiki
Yatabe, Kohei
Koizumi, Yuma
Oikawa, Yasuhiro
Harada, Noboru
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6644 - 6648
[3] Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis
Song, Eunwoo
Soong, Frank K.
Kang, Hong-Goo
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2253 - 2257
[4] A study of speaker adaptation for DNN-based speech synthesis
Wu, Zhizheng
Swietojanski, Pawel
Veaux, Christophe
Renals, Steve
King, Simon
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883
[5] DNN-Based Arabic Speech Synthesis
Amrouche, Aissa
Bentrcia, Youssouf
Boubakeur, Khadidja Nesrine
Abed, Ahcene
[J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
[6] Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation
Jiang, Wenbin
Wen, Fei
Liu, Peilin
[J]. IEEE ACCESS, 2018, 6 : 52385 - 52392
[7] DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH
Zhao, Yan
Wang, DeLiang
Merks, Ivo
Zhang, Tao
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6525 - 6529
[8] DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
Tachbelie, Martha Yifiru
Abulimiti, Ayimunishagu
Abate, Solomon Teferra
Schultz, Tanja
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8269 - 8273
[9] IMPACT OF SINGLE-MICROPHONE DEREVERBERATION ON DNN-BASED MEETING TRANSCRIPTION SYSTEMS
Yoshioka, Takuya
Chen, Xie
Gales, Mark J. F.
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[10] A DNN-based emotional speech synthesis by speaker adaptation
Yang, Hongwu
Zhang, Weizhao
Zhi, Pengpeng
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637

← 1 2 3 4 5 →