Dense CNN With Self-Attention for Time-Domain Speech Enhancement

被引:5
|
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Speech enhancement; Convolution; Time-domain analysis; Signal to noise ratio; Noise measurement; Training; Feature extraction; self-attention network; time-domain enhancement; dense convolutional network; frequency-domain loss; CONVOLUTIONAL NEURAL-NETWORK;
D O I
10.1109/TASLP.2021.3064421
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.
引用
收藏
页码:1270 / 1279
页数:10
相关论文
共 50 条
  • [21] Self-attention CNN for retinal layer segmentation in OCT
    Cao, Guogang
    Wu, Yan
    Peng, Zeyu
    Zhou, Zhilin
    Dai, Cuixia
    BIOMEDICAL OPTICS EXPRESS, 2024, 15 (03) : 1605 - 1617
  • [22] Multi-Kernel Attention Encoder For Time-Domain Speech Separation
    Liu, Zengrun
    Shi, Diya
    Wei, Ying
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [23] Training a popular Mahjong agent with CNN and self-attention
    Liu, Liu
    Zhang, XiaoChuan
    He, ZeYa
    Liu, Jie
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2024, 19 (02) : 157 - 166
  • [24] T-GSA: TRANSFORMER WITH GAUSSIAN-WEIGHTED SELF-ATTENTION FOR SPEECH ENHANCEMENT
    Kim, Jaeyoung
    El-Khamy, Mostafa
    Lee, Jungwon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6649 - 6653
  • [25] CHARACTERIZING SPEECH ADVERSARIAL EXAMPLES USING SELF-ATTENTION U-NET ENHANCEMENT
    Yang, Chao-Han
    Qi, Jun
    Chen, Pin-Yu
    Ma, Xiaoli
    Lee, Chin-Hui
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3107 - 3111
  • [26] Exploring Self-Attention Mechanisms for Speech Separation
    Subakan, Cem
    Ravanelli, Mirco
    Cornell, Samuele
    Grondin, Francois
    Bronzi, Mirko
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2169 - 2180
  • [27] Light-Weight Self-Attention Augmented Generative Adversarial Networks for Speech Enhancement
    Li, Lujun
    Lu, Zhenxing
    Watzel, Tobias
    Kurzinger, Ludwig
    Rigoll, Gerhard
    ELECTRONICS, 2021, 10 (13)
  • [28] SE-Conformer: Time-Domain Speech Enhancement using Conformer
    Kim, Eesung
    Seo, Hyeji
    INTERSPEECH 2021, 2021, : 2736 - 2740
  • [29] Improved Speech Enhancement using a Time-Domain GAN with Mask Learning
    Lin, Ju
    Niu, Sufeng
    van Wijngaarden, Adriaan J.
    McClendon, Jerome L.
    Smith, Melissa C.
    Wang, Kuang-Ching
    INTERSPEECH 2020, 2020, : 3286 - 3290
  • [30] Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning
    Xiang Hao
    Chenglin Xu
    Lei Xie
    Haizhou Li
    Tsinghua Science and Technology, 2022, 27 (06) : 939 - 947