A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT

被引:0
|
作者
Zhen, Kai [1 ,2 ]
Lee, Mi Suk [3 ]
Kim, Minje [1 ,2 ]
机构
[1] Indiana Univ, Luddy Sch Informat Comp & Engn, Bloomington, IN 47405 USA
[2] Indiana Univ, Cognit Sci Program, Bloomington, IN 47405 USA
[3] Elect & Telecommun Res Inst, Daejeon, South Korea
关键词
End-to-end; speech enhancement; context aggregation; residual learning; dilated convolution; recurrent network; NOISE;
D O I
10.1109/icassp40776.2020.9054499
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in the time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.
引用
收藏
页码:366 / 370
页数:5
相关论文
共 50 条
  • [41] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINTLY TRAINED NEURAL FEATURE ENHANCEMENT
    Kim, Chanwoo
    Garg, Abhinav
    Gowda, Dhananjaya
    Mun, Seongkyu
    Han, Changwoo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6773 - 6777
  • [42] End-to-End Speech Enhancement Using Fully Convolutional Networks with Skip Connections
    Wang, Dujuan
    Bao, Changchun
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 890 - 895
  • [43] Perception-guided generative adversarial network for end-to-end speech enhancement
    Li, Yihao
    Sun, Meng
    Zhang, Xiongwei
    APPLIED SOFT COMPUTING, 2022, 128
  • [44] NEURAL NOISE EMBEDDING FOR END-TO-END SPEECH ENHANCEMENT WITH CONDITIONAL LAYER NORMALIZATION
    Zhang, Zhihui
    Li, Xiaoqi
    Li, Yaxing
    Dong, Yuanjie
    Wang, Dan
    Xiong, Shengwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7113 - 7117
  • [45] A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement
    Borgstrom, Bengt J.
    Brandstein, Michael S.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2418 - 2431
  • [46] An Efficient Algorithm for Context-Aware End-to-End Connectivity Management
    Sen, Jaydip
    Ukil, Arijit
    ISWPC: 2009 4TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERVASIVE COMPUTING, 2009, : 74 - 78
  • [47] Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
    Hong, Joanna
    Kim, Minsu
    Yoo, Daehun
    Ro, Yong Man
    INTERSPEECH 2022, 2022, : 2838 - 2842
  • [48] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
    Chen, Jingjing
    Mao, Qirong
    Liu, Dong
    INTERSPEECH 2020, 2020, : 2642 - 2646
  • [49] Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems
    Wang, Xiaoqiang
    Liu, Yanqing
    Li, Jinyu
    Miljanic, Veljko
    Zhao, Sheng
    Khalil, Hosam
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3089 - 3097
  • [50] Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks
    Zhang, Wei
    Zhai, Minghao
    Huang, Zilong
    Liu, Chen
    Li, Wei
    Cao, Yi
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PART VI, 2019, 11745 : 332 - 341