Dense CNN With Self-Attention for Time-Domain Speech Enhancement

被引:5
|
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Speech enhancement; Convolution; Time-domain analysis; Signal to noise ratio; Noise measurement; Training; Feature extraction; self-attention network; time-domain enhancement; dense convolutional network; frequency-domain loss; CONVOLUTIONAL NEURAL-NETWORK;
D O I
10.1109/TASLP.2021.3064421
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.
引用
收藏
页码:1270 / 1279
页数:10
相关论文
共 50 条
  • [1] Time domain speech enhancement with CNN and time-attention transformer
    Saleem, Nasir
    Gunawan, Teddy Surya
    Dhahbi, Sami
    Bourouis, Sami
    DIGITAL SIGNAL PROCESSING, 2024, 147
  • [2] SELF-ATTENTION WITH RESTRICTED TIME CONTEXT AND RESOLUTION IN DNN SPEECH ENHANCEMENT
    Strake, Maximilian
    Behlke, Adrian
    Fingscheidt, Tim
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [3] A Nested U-Net With Self-Attention and Dense Connectivity for Monaural Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 105 - 109
  • [4] SPEECH DENOISING IN THE WAVEFORM DOMAIN WITH SELF-ATTENTION
    Kong, Zhifeng
    Ping, Wei
    Dantrey, Ambrish
    Catanzaro, Bryan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7867 - 7871
  • [5] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [6] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [7] Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement
    Oostermeijer, Koen
    Wang, Qing
    Du, Jun
    INTERSPEECH 2021, 2021, : 2831 - 2835
  • [8] Visually Assisted Time-Domain Speech Enhancement
    Ideli, Elham
    Sharpe, Bruce
    Bajic, Ivan, V
    Vaughan, Rodney G.
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [9] Masked multi-head self-attention for causal speech enhancement
    Nicolson, Aaron
    Paliwal, Kuldip K.
    SPEECH COMMUNICATION, 2020, 125 : 80 - 96
  • [10] Exploring Multi-Stage GAN with Self-Attention for Speech Enhancement
    Asiedu Asante, Bismark Kweku
    Broni-Bediako, Clifford
    Imamura, Hiroki
    APPLIED SCIENCES-BASEL, 2023, 13 (16):