A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement

被引:0
|
作者
Wang, Yu-Xuan [1 ]
Du, Jun [1 ]
Chai, Li [1 ]
Lee, Chin-Hui [2 ]
Pan, Jia [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
attention mechanism; memory block; noiseaware training; LSTM-RNN; speech enhancement;
D O I
10.21437/Interspeech.2020-2037
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a novel noise-aware memory-attention network (NAMAN) for regression-based speech enhancement, aiming at improving quality of enhanced speech in unseen noise conditions. The NAMAN architecture consists of three parts, a main regression network, a memory block and an attention block. First, a long short-term memory recurrent neural network (LSTM-RNN) is adopted as the main network to well model the acoustic context of neighboring frames. Next, the memory block is built with an extensive set of noise feature vectors as the prior noise bases. Finally, the attention block serves as an auxiliary network to improve the noise awareness of the main network by encoding the dynamic noise information at frame level through additional features obtained by weighing the existing noise basis vectors in the memory block. Our experiments show that the proposed NAMAN framework is compact and outperforms the state-of-the-art dynamic noise-aware training approaches in low SNR conditions.
引用
收藏
页码:4501 / 4505
页数:5
相关论文
共 50 条
  • [31] Towards efficient video-based action recognition: context-aware memory attention network
    Koh, Thean Chun
    Yeo, Chai Kiat
    Jing, Xuan
    Sivadas, Sunil
    [J]. SN APPLIED SCIENCES, 2023, 5 (12):
  • [32] Towards efficient video-based action recognition: context-aware memory attention network
    Thean Chun Koh
    Chai Kiat Yeo
    Xuan Jing
    Sunil Sivadas
    [J]. SN Applied Sciences, 2023, 5
  • [33] A STUDY OF TRAINING TARGETS FOR DEEP NEURAL NETWORK-BASED SPEECH ENHANCEMENT USING NOISE PREDICTION
    Odelowo, Babafemi O.
    Anderson, David V.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5409 - 5413
  • [34] A NOISE PREDICTION AND TIME-DOMAIN SUBTRACTION APPROACH TO DEEP NEURAL NETWORK BASED SPEECH ENHANCEMENT
    Odelowo, Babafemi O.
    Anderson, David V.
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 372 - 377
  • [35] SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network
    Lv R.
    Chen N.
    Cheng S.
    Fan G.
    Rao L.
    Song X.
    Lv W.
    Yang D.
    [J]. Mathematical Biosciences and Engineering, 2024, 21 (03) : 3860 - 3875
  • [36] Adaptive Speech Intelligibility Enhancement for Far-and-Near-end Noise Environments Based on Self-attention StarGAN
    Li, Dengshi
    Zhao, Lanxin
    Xiao, Jing
    Liu, Jiaqi
    Guan, Duanzheng
    Wang, Qianrui
    [J]. MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 205 - 217
  • [37] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    [J]. INTERSPEECH 2021, 2021, : 1862 - 1866
  • [38] TENSOR-TO-VECTOR REGRESSION FOR MULTI-CHANNEL SPEECH ENHANCEMENT BASED ON TENSOR-TRAIN NETWORK
    Qi, Jun
    Hu, Hu
    Wang, Yannan
    Yang, Chao-Han Huck
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7504 - 7508
  • [39] iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
    Li, Haoyu
    Fu, Szu-Wei
    Tsao, Yu
    Yamagishi, Junichi
    [J]. INTERSPEECH 2020, 2020, : 1336 - 1340
  • [40] Subjective Evaluation of a Noise-Reduced Training Target for Deep Neural Network-Based Speech Enhancement
    Gelderblom, Femke B.
    Tronstad, Tron, V
    Viggen, Erlend Magnus
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 583 - 594