Dense CNN With Self-Attention for Time-Domain Speech Enhancement

被引:5
|
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Speech enhancement; Convolution; Time-domain analysis; Signal to noise ratio; Noise measurement; Training; Feature extraction; self-attention network; time-domain enhancement; dense convolutional network; frequency-domain loss; CONVOLUTIONAL NEURAL-NETWORK;
D O I
10.1109/TASLP.2021.3064421
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.
引用
收藏
页码:1270 / 1279
页数:10
相关论文
共 50 条
  • [41] Multilingual Speech Recognition with Self-Attention Structured Parameterization
    Zhu, Yun
    Haghani, Parisa
    Tripathi, Anshuman
    Ramabhadran, Bhuvana
    Farris, Brian
    Xu, Hainan
    Lu, Han
    Sak, Hasim
    Leal, Isabel
    Gaur, Neeraj
    Moreno, Pedro J.
    Zhang, Qian
    INTERSPEECH 2020, 2020, : 4741 - 4745
  • [42] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
    Zhang, Shucong
    Loweimi, Erfan
    Bell, Peter
    Renals, Steve
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96
  • [43] Multi-Stride Self-Attention for Speech Recognition
    Han, Kyu J.
    Huang, Jing
    Tang, Yun
    He, Xiaodong
    Zhou, Bowen
    INTERSPEECH 2019, 2019, : 2788 - 2792
  • [44] ESAformer: Enhanced Self-Attention for Automatic Speech Recognition
    Li, Junhua
    Duan, Zhikui
    Li, Shiren
    Yu, Xinmei
    Yang, Guangguang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 471 - 475
  • [45] NEPALI SPEECH RECOGNITION USING SELF-ATTENTION NETWORKS
    Joshi, Basanta
    Shrestha, Rupesh
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (06): : 1769 - 1784
  • [46] Remote Sensing Time Series Classification Based on Self-Attention Mechanism and Time Sequence Enhancement
    Liu, Jingwei
    Yan, Jining
    Wang, Lizhe
    Huang, Liang
    He, Haixu
    Liu, Hong
    REMOTE SENSING, 2021, 13 (09)
  • [47] Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement
    Song, Zhendong
    Ma, Yupeng
    Tan, Fang
    Feng, Xiaoyi
    APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [48] Time-Domain Multi-Modal Bone/Air Conducted Speech Enhancement
    Yu, Cheng
    Hung, Kuo-Hsuan
    Wang, Syu-Siang
    Tsao, Yu
    Hung, Jeih-weih
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1035 - 1039
  • [49] Time-domain structural analysis of speech
    Ekstein, K
    Moucek, R
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 506 - 510
  • [50] Self-Attention Underwater Image Enhancement by Data Augmentation
    Gao, Yu
    Luo, Huifu
    Zhu, Wei
    Ma, Feng
    Zhao, Jiang
    Qin, Kailin
    PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 991 - 995