A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT

被引：0

作者：

Zhen, Kai ^{[1
,2
]}

Lee, Mi Suk ^{[3
]}

Kim, Minje ^{[1
,2
]}

机构：

[1] Indiana Univ, Luddy Sch Informat Comp & Engn, Bloomington, IN 47405 USA

[2] Indiana Univ, Cognit Sci Program, Bloomington, IN 47405 USA

[3] Elect & Telecommun Res Inst, Daejeon, South Korea

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

End-to-end; speech enhancement; context aggregation; residual learning; dilated convolution; recurrent network; NOISE;

D O I：

10.1109/icassp40776.2020.9054499

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in the time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.

引用

页码：366 / 370

页数：5

共 50 条

[31] Towards an End-to-End Speech Recognition Model for Accurate Quranic Recitation
Al-Fadhli, Sumayya
Al-Harbi, Hajar
Cherif, Asma
2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
[32] Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Skerry-Ryan, R. J.
Battenberg, Eric
Xiao, Ying
Wang, Yuxuan
Stanton, Daisy
Shor, Joel
Weiss, Ron J.
Clark, Rob
Saurous, Rif A.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[33] Towards multilingual end-to-end speech recognition for air traffic control
Lin, Yi
Yang, Bo
Guo, Dongyue
Fan, Peng
IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (09) : 1203 - 1214
[34] Towards end-to-end training of automatic speech recognition for nigerian pidgin
Ajisafe, Daniel
Adegboro, Oluwabukola
Oduntan, Esther
Arulogun, Tayo
arXiv, 2020,
[35] Exploring end-to-end framework towards Khasi speech recognition system
Syiem, Bronson
Singh, L. Joyprakash
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 419 - 424
[36] Towards Efficient End-to-End Encryption for Container Checkpointing Systems
Stoyanov, Radostin
Reber, Adrian
Ueno, Daiki
Clapinski, Michal
Vagin, Andrei
Bruno, Rodrigo
PROCEEDINGS OF THE 15TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS, APSYS 2024, 2024, : 60 - 66
[37] An efficient method for end-to-end available bandwidth measurement
Lin, LD
Jia, WJ
Performance Challenges for Efficient Next Generation Networks, Vols 6A-6C, 2005, 6A-6C : 253 - 262
[38] An Efficient and High Fidelity Vietnamese Streaming End-to-End Speech Synthesis
Tho Tran
The Chuong Chu
Hoang Vu
Trung Bui
Truong, Steven Q. H.
INTERSPEECH 2022, 2022, : 466 - 470
[39] Efficient decoding self-attention for end-to-end speech synthesis
Zhao, Wei
Xu, Li
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (07) : 1127 - 1138
[40] MA-Net: Resource-efficient multi-attentional network for end-to-end speech enhancement
Wahab, Fazal E.
Ye, Zhongfu
Saleem, Nasir
Ullah, Rizwan
Hussain, Amir
NEUROCOMPUTING, 2025, 619

← 1 2 3 4 5 →