Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement

被引:3
|
作者
Zhang, Zehua [1 ]
Zhang, Lu [2 ]
Zhuang, Xuyi [1 ]
Qian, Yukun [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, 6,Pingshan 1st Rd,Taoyuan St, Shenzhen 518000, Guangdong, Peoples R China
[2] NIO Automobile Co LTD, Lane 56,Antuo Rd, Shanghai 201800, Peoples R China
基金
中国国家自然科学基金;
关键词
Supervised attention; Monaural speech enhancement; Complex compressed spectrum; Complex ratio mask; Multi-scale temporal convolutional network; NOISE-ESTIMATION; NEURAL-NETWORKS; SELF-ATTENTION; ALGORITHM; MODEL;
D O I
10.1186/s13636-024-00341-x
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study develops robust, deep neural network (DNN)-based speech enhancement methods. We reproduce several DNN-based monaural speech enhancement methods and outline a strategy for constructing datasets. This strategy, validated through experimental reproductions, has effectively enhanced the denoising efficiency and robustness of the models. Then, we propose a causal speech enhancement system named Supervised Attention Multi-Scale Temporal Convolutional Network (SA-MSTCN). SA-MSTCN extracts the complex compressed spectrum (CCS) for input encoding and employs complex ratio masking (CRM) for output decoding. The supervised attention module, a lightweight addition to SA-MSTCN, guides feature extraction. Experiment results show that the supervised attention module effectively improves noise reduction performance with a minor increase in computational cost. The multi-scale temporal convolutional network refines the perceptual field and better reconstructs the speech signal. Overall, SA-MSTCN not only achieves state-of-the-art speech quality and intelligibility compared to other methods but also maintains stable denoising performance across various environments.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR SPEECH ENHANCEMENT
    Zhang, Guochang
    Yu, Libiao
    Wang, Chunliang
    Wei, Jianqiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9122 - 9126
  • [2] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Zhang, Guochang
    Wang, Chunliang
    Yu, Libiao
    Wei, Jianqiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9206 - 9210
  • [3] Group Multi-Scale convolutional Network for Monaural Speech Enhancement in Time-domain
    Yu, Juntao
    Jiang, Ting
    Yu, Jiacheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 646 - 650
  • [4] Multi-scale informative perceptual network for monaural speech enhancement
    Lan, Tian
    Li, Jiajia
    Feng, Yujia
    Tai, Wenxin
    Wang, Yixiang
    Chen, Cong
    Kang, Jun
    Liu, Qiao
    [J]. APPLIED ACOUSTICS, 2022, 195
  • [5] REDUNDANT CONVOLUTIONAL NETWORK WITH ATTENTION MECHANISM FOR MONAURAL SPEECH ENHANCEMENT
    Lan, Tian
    Lyu, Yilan
    Hui, Guoqiang
    Mokhosi, Refuoe
    Li, Sen
    Liu, Qiao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6654 - 6658
  • [6] Multi-stage attention network for monaural speech enhancement
    Wang, Kunpeng
    Lu, Wenjing
    Liu, Peng
    Yao, Juan
    Li, Huafeng
    [J]. IET SIGNAL PROCESSING, 2023, 17 (03)
  • [7] An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement
    Xu, Zezheng
    Jiang, Ting
    Li, Chao
    Yu, Jiacheng
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [8] Convolutional fusion network for monaural speech enhancement
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    [J]. NEURAL NETWORKS, 2021, 143 : 97 - 107
  • [9] A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1455 - 1459
  • [10] Combining Multi-Perspective Attention Mechanism With Convolutional Networks for Monaural Speech Enhancement
    Lan, Tian
    Lyu, Yilan
    Ye, Wenzheng
    Hui, Guoqiang
    Xu, Zenglin
    Liu, Qiao
    [J]. IEEE ACCESS, 2020, 8 : 78979 - 78991