Monaural Speech Enhancement Based on Attention-Gate Dilated Convolution Network

被引:0
|
作者
Zhang Tianqi [1 ]
Bai Haojun [1 ]
Ye Shaopeng [1 ]
Liu Jianxing [1 ]
机构
[1] Chongqing Univ Posts & Telecommun CQUPT, Sch Commun & Informat Engn, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech enhancement; Dilated convolution; Residual learning; Gate mechanism; Attention mechanism;
D O I
10.11999/JEIT210654
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In supervised speech enhancement, contextual information has an important influence on the estimation of target speech. In order to obtain richer global related features of speech, a new convolution network for speech enhancement on the premise of the smallest possible parameters is designed in this paper. The proposed network contains three parts: encode layer, transfer layer and decode layer. The encode and decode part propose a Two-Dimensional Asymmetric Dilated Residual ( 2D-ADR) module, which can significantly reduce training parameters and expand the receptive field, and improve the model's ability to obtain contextual information. The transfer layer proposes a One-Dimensional Gating Dilated Residual (1DGDR) module, which combines dilated convolution, residual learning and gating mechanism to transfer selectively features and obtain more time-related information. Moreover, the eight 1D-GDR modules are stacked by a dense skip-connection way to enhance the information flow between layers and provide more gradient propagation path. Finally, the corresponding encode and decode layer is connected by skip-connection and attention mechanism is introduced to make the decoding process obtain more robust underlying features. In the experimental part, different parameter settings and comparison methods are used to verify the effectiveness and robustness of the network. By training and testing under 28 kinds of noise, compared with other methods, the proposed method has achieved better objective and subjective metrics with 1.25 million parameters, and has better enhancement effect and generalization ability.
引用
收藏
页码:3277 / 3288
页数:12
相关论文
共 26 条
  • [1] Choi H, 2019, INT CONF ACOUST SPEE, P6950, DOI 10.1109/ICASSP.2019.8683682
  • [2] Dauphin YN, 2017, PR MACH LEARN RES, V70
  • [3] NAAGN: Noise-aware Attention-gated Network for Speech Enhancement
    Deng, Feng
    Jiang, Tao
    Wang, Xiao-Rui
    Zhang, Chen
    Li, Yan
    [J]. INTERSPEECH 2020, 2020, : 2457 - 2461
  • [4] Garofolo J. S., 1993, NISTIR4930
  • [5] Learning to forget: Continual prediction with LSTM
    Gers, FA
    Schmidhuber, J
    Cummins, F
    [J]. NEURAL COMPUTATION, 2000, 12 (10) : 2451 - 2471
  • [6] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [7] An MMSE estimator for speech enhancement under a combined stochastic-deterministic speech model
    Hendriks, Richard C.
    Heusdens, Richard
    Jensen, Jesper
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 406 - 415
  • [8] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
  • [9] Densely Connected Convolutional Networks
    Huang, Gao
    Liu, Zhuang
    van der Maaten, Laurens
    Weinberger, Kilian Q.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2261 - 2269
  • [10] Kaneko T, 2019, INT CONF ACOUST SPEE, P6820, DOI [10.1109/icassp.2019.8682897, 10.1109/ICASSP.2019.8682897]