A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT

被引:0
|
作者
Zhen, Kai [1 ,2 ]
Lee, Mi Suk [3 ]
Kim, Minje [1 ,2 ]
机构
[1] Indiana Univ, Luddy Sch Informat Comp & Engn, Bloomington, IN 47405 USA
[2] Indiana Univ, Cognit Sci Program, Bloomington, IN 47405 USA
[3] Elect & Telecommun Res Inst, Daejeon, South Korea
关键词
End-to-end; speech enhancement; context aggregation; residual learning; dilated convolution; recurrent network; NOISE;
D O I
10.1109/icassp40776.2020.9054499
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in the time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.
引用
收藏
页码:366 / 370
页数:5
相关论文
共 50 条
  • [31] Towards an End-to-End Speech Recognition Model for Accurate Quranic Recitation
    Al-Fadhli, Sumayya
    Al-Harbi, Hajar
    Cherif, Asma
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [32] Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
    Skerry-Ryan, R. J.
    Battenberg, Eric
    Xiao, Ying
    Wang, Yuxuan
    Stanton, Daisy
    Shor, Joel
    Weiss, Ron J.
    Clark, Rob
    Saurous, Rif A.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [33] Towards multilingual end-to-end speech recognition for air traffic control
    Lin, Yi
    Yang, Bo
    Guo, Dongyue
    Fan, Peng
    IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (09) : 1203 - 1214
  • [34] Towards end-to-end training of automatic speech recognition for nigerian pidgin
    Ajisafe, Daniel
    Adegboro, Oluwabukola
    Oduntan, Esther
    Arulogun, Tayo
    arXiv, 2020,
  • [35] Exploring end-to-end framework towards Khasi speech recognition system
    Syiem, Bronson
    Singh, L. Joyprakash
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 419 - 424
  • [36] Towards Efficient End-to-End Encryption for Container Checkpointing Systems
    Stoyanov, Radostin
    Reber, Adrian
    Ueno, Daiki
    Clapinski, Michal
    Vagin, Andrei
    Bruno, Rodrigo
    PROCEEDINGS OF THE 15TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS, APSYS 2024, 2024, : 60 - 66
  • [37] An efficient method for end-to-end available bandwidth measurement
    Lin, LD
    Jia, WJ
    Performance Challenges for Efficient Next Generation Networks, Vols 6A-6C, 2005, 6A-6C : 253 - 262
  • [38] An Efficient and High Fidelity Vietnamese Streaming End-to-End Speech Synthesis
    Tho Tran
    The Chuong Chu
    Hoang Vu
    Trung Bui
    Truong, Steven Q. H.
    INTERSPEECH 2022, 2022, : 466 - 470
  • [39] Efficient decoding self-attention for end-to-end speech synthesis
    Zhao, Wei
    Xu, Li
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (07) : 1127 - 1138
  • [40] MA-Net: Resource-efficient multi-attentional network for end-to-end speech enhancement
    Wahab, Fazal E.
    Ye, Zhongfu
    Saleem, Nasir
    Ullah, Rizwan
    Hussain, Amir
    NEUROCOMPUTING, 2025, 619