A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT

被引:0
|
作者
Zhen, Kai [1 ,2 ]
Lee, Mi Suk [3 ]
Kim, Minje [1 ,2 ]
机构
[1] Indiana Univ, Luddy Sch Informat Comp & Engn, Bloomington, IN 47405 USA
[2] Indiana Univ, Cognit Sci Program, Bloomington, IN 47405 USA
[3] Elect & Telecommun Res Inst, Daejeon, South Korea
关键词
End-to-end; speech enhancement; context aggregation; residual learning; dilated convolution; recurrent network; NOISE;
D O I
10.1109/icassp40776.2020.9054499
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in the time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.
引用
收藏
页码:366 / 370
页数:5
相关论文
共 50 条
  • [1] A Dual-Channel End-to-End Speech Enhancement Method Using Complex Operations in the Time Domain
    Pang, Jian
    Li, Hongcheng
    Jiang, Tao
    Wang, Hui
    Liao, Xiangning
    Luo, Le
    Liu, Hongqing
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [2] End-to-End Dual-Branch Network Towards Synthetic Speech Detection
    Ma, Kaijie
    Feng, Yifan
    Chen, Beijing
    Zhao, Guoying
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 359 - 363
  • [3] WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement
    Hsieh, Tsun-An
    Wang, Hsin-Min
    Lu, Xugang
    Tsao, Yu
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 2149 - 2153
  • [4] Tacotron: Towards End-to-End Speech Synthesis
    Wang, Yuxuan
    Skerry-Ryan, R. J.
    Stanton, Daisy
    Wu, Yonghui
    Weiss, Ron J.
    Jaitly, Navdeep
    Yang, Zongheng
    Xiao, Ying
    Chen, Zhifeng
    Bengio, Samy
    Quoc Le
    Agiomyrgiannakis, Yannis
    Clark, Rob
    Saurous, Rif A.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 4006 - 4010
  • [5] Towards End-to-End Synthetic Speech Detection
    Hua, Guang
    Teoh, Andrew Beng Jin
    Zhang, Haijian
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 1265 - 1269
  • [6] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [7] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
  • [8] Towards Context-Aware End-to-End Code-Switching Speech Recognition
    Qiu, Zimeng
    Li, Yiyuan
    Li, Xinjian
    Metze, Florian
    Campbell, William M.
    INTERSPEECH 2020, 2020, : 4776 - 4780
  • [9] An End-to-End Speech Enhancement Method Combining Attention Mechanism to Improve GAN
    Chen, Wei
    Cai, Yichao
    Yang, Qingyu
    Wang, Ge
    Liu, Taian
    Liu, Xinying
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 538 - 542
  • [10] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425