An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement

被引:3
|
作者
Xu, Zezheng [1 ]
Jiang, Ting [1 ]
Li, Chao [1 ]
Yu, Jiacheng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100000, Peoples R China
基金
国家自然科学基金重大项目; 中国国家自然科学基金;
关键词
speech enhancement; fully convolutional neural networks; self-attention; Huber Loss; TIME;
D O I
10.1109/ISCSLP49672.2021.9362114
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural networks (CNN) have made remarkable achievements in speech enhancement. However, the convolution operation is difficult to obtain the global context of the feature map due to its locality. To solve the above problem, we propose an attention-augmented fully convolutional neural network for monaural speech enhancement. More specifically, the method is to integrate a new two-dimensional relative self-attention mechanism into fully convolutional networks. Besides, we utilize Huber Loss as the loss function, which is more robust to noise. Experimental results indicate that compared with the optimally modified log-spectral amplitude (OMLSA) estimator and other CNN-based models, our proposed network has better performance in five indicators, and can well balance noise suppression and speech distortion. What is more, we also embed the proposed attention mechanism into other convolutional networks and get satisfactory results, showing that this mechanism has great generalization ability.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Audio classification using attention-augmented convolutional neural network
    Wu, Yu
    Mao, Hua
    Yi, Zhang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 161 : 90 - 100
  • [2] REDUNDANT CONVOLUTIONAL NETWORK WITH ATTENTION MECHANISM FOR MONAURAL SPEECH ENHANCEMENT
    Lan, Tian
    Lyu, Yilan
    Hui, Guoqiang
    Mokhosi, Refuoe
    Li, Sen
    Liu, Qiao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6654 - 6658
  • [3] Dilated convolutional recurrent neural network for monaural speech enhancement
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    [J]. CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 158 - 162
  • [4] A Fully Convolutional Neural Network for Speech Enhancement
    Park, Se Rim
    Lee, Jin Won
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1993 - 1997
  • [5] Convolutional fusion network for monaural speech enhancement
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    [J]. NEURAL NETWORKS, 2021, 143 : 97 - 107
  • [6] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
    Chao Sun
    Min Zhang
    Ruijuan Wu
    Junhong Lu
    Guo Xian
    Qin Yu
    Xiaofeng Gong
    Ruisen Luo
    [J]. Scientific Reports, 11
  • [7] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
    Sun, Chao
    Zhang, Min
    Wu, Ruijuan
    Lu, Junhong
    Xian, Guo
    Yu, Qin
    Gong, Xiaofeng
    Luo, Ruisen
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [8] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
    Gao, Fei
    Guan, Haixin
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
  • [9] CSI-Fingerprinting Indoor Localization via Attention-Augmented Residual Convolutional Neural Network
    Zhang, Bowen
    Sifaou, Houssem
    Li, Geoffrey Ye
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (08) : 5583 - 5597
  • [10] Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
    Zhang, Zehua
    Zhang, Lu
    Zhuang, Xuyi
    Qian, Yukun
    Wang, Mingjiang
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)