Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement

被引:3
|
作者
Zhang, Zehua [1 ]
Zhang, Lu [2 ]
Zhuang, Xuyi [1 ]
Qian, Yukun [1 ]
Wang, Mingjiang [1 ]
机构
[1] Harbin Inst Technol, 6,Pingshan 1st Rd,Taoyuan St, Shenzhen 518000, Guangdong, Peoples R China
[2] NIO Automobile Co LTD, Lane 56,Antuo Rd, Shanghai 201800, Peoples R China
基金
中国国家自然科学基金;
关键词
Supervised attention; Monaural speech enhancement; Complex compressed spectrum; Complex ratio mask; Multi-scale temporal convolutional network; NOISE-ESTIMATION; NEURAL-NETWORKS; SELF-ATTENTION; ALGORITHM; MODEL;
D O I
10.1186/s13636-024-00341-x
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signals are often distorted by reverberation and noise, with a widely distributed signal-to-noise ratio (SNR). To address this, our study develops robust, deep neural network (DNN)-based speech enhancement methods. We reproduce several DNN-based monaural speech enhancement methods and outline a strategy for constructing datasets. This strategy, validated through experimental reproductions, has effectively enhanced the denoising efficiency and robustness of the models. Then, we propose a causal speech enhancement system named Supervised Attention Multi-Scale Temporal Convolutional Network (SA-MSTCN). SA-MSTCN extracts the complex compressed spectrum (CCS) for input encoding and employs complex ratio masking (CRM) for output decoding. The supervised attention module, a lightweight addition to SA-MSTCN, guides feature extraction. Experiment results show that the supervised attention module effectively improves noise reduction performance with a minor increase in computational cost. The multi-scale temporal convolutional network refines the perceptual field and better reconstructs the speech signal. Overall, SA-MSTCN not only achieves state-of-the-art speech quality and intelligibility compared to other methods but also maintains stable denoising performance across various environments.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Multi-Scale Convolutional Neural Network for Temporal Knowledge Graph Completion
    Wei Liu
    Peijie Wang
    Zhihui Zhang
    Qiong Liu
    [J]. Cognitive Computation, 2023, 15 : 1016 - 1022
  • [32] Msap: multi-scale attention probabilistic network for underwater image enhancement network
    Chang, Baocai
    Li, Jinjiang
    Wang, Haiyang
    Li, Mengjun
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (SUPPL 1) : 653 - 661
  • [33] FB-MSTCN: A FULL-BAND SINGLE-CHANNEL SPEECH ENHANCEMENT METHOD BASED ON MULTI-SCALE TEMPORAL CONVOLUTIONAL NETWORK
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Li, Heng
    Wang, Mingjiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9276 - 9280
  • [34] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
    Chao Sun
    Min Zhang
    Ruijuan Wu
    Junhong Lu
    Guo Xian
    Qin Yu
    Xiaofeng Gong
    Ruisen Luo
    [J]. Scientific Reports, 11
  • [35] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
    Sun, Chao
    Zhang, Min
    Wu, Ruijuan
    Lu, Junhong
    Xian, Guo
    Yu, Qin
    Gong, Xiaofeng
    Luo, Ruisen
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [36] Multi-scale temporal features extraction based graph convolutional network with attention for multivariate time series prediction
    Chen, Yawen
    Ding, Fengqian
    Zhai, Linbo
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 200
  • [37] Selective Deeply Supervised Multi-Scale Attention Network for Brain Tumor Segmentation
    Rehman, Azka
    Usman, Muhammad
    Shahid, Abdullah
    Latif, Siddique
    Qadir, Junaid
    [J]. SENSORS, 2023, 23 (04)
  • [38] Multi-Scale Attention Feature Enhancement Network for Single Image Dehazing
    Dong, Weida
    Wang, Chunyan
    Sun, Hao
    Teng, Yunjie
    Xu, Xiping
    [J]. SENSORS, 2023, 23 (19)
  • [39] Multi-scale Underwater Image Enhancement Network Based on Attention Mechanism
    Fang Ming
    Liu Xiaohan
    Fu Feiran
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (12) : 3513 - 3521
  • [40] Multi-Scale Attention Generative Adversarial Network for Medical Image Enhancement
    Zhong, Guojin
    Ding, Weiping
    Chen, Long
    Wang, Yingxu
    Yu, Yu-Feng
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1113 - 1125