A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement

被引:0
|
作者
Wang, Yu-Xuan [1 ]
Du, Jun [1 ]
Chai, Li [1 ]
Lee, Chin-Hui [2 ]
Pan, Jia [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
attention mechanism; memory block; noiseaware training; LSTM-RNN; speech enhancement;
D O I
10.21437/Interspeech.2020-2037
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a novel noise-aware memory-attention network (NAMAN) for regression-based speech enhancement, aiming at improving quality of enhanced speech in unseen noise conditions. The NAMAN architecture consists of three parts, a main regression network, a memory block and an attention block. First, a long short-term memory recurrent neural network (LSTM-RNN) is adopted as the main network to well model the acoustic context of neighboring frames. Next, the memory block is built with an extensive set of noise feature vectors as the prior noise bases. Finally, the attention block serves as an auxiliary network to improve the noise awareness of the main network by encoding the dynamic noise information at frame level through additional features obtained by weighing the existing noise basis vectors in the memory block. Our experiments show that the proposed NAMAN framework is compact and outperforms the state-of-the-art dynamic noise-aware training approaches in low SNR conditions.
引用
下载
收藏
页码:4501 / 4505
页数:5
相关论文
共 50 条
  • [21] Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement
    Qi, Jun
    Hu, Hu
    Wang, Yannan
    Yang, Chao-Han Huck
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    INTERSPEECH 2020, 2020, : 76 - 80
  • [22] Regression-Based Neuro-Fuzzy Network Trained by ABC Algorithm for High-Density Impulse Noise Elimination
    Caliskan, Abdullah
    Cil, Zeynel Abidin
    Badem, Hasan
    Karaboga, Dervis
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (06) : 1084 - 1095
  • [23] Speech Enhancement via Attention Masking Network (SEAMNET): An End-to-End System for Joint Suppression of Noise and Reverberation
    Borgstrom, Bengt J.
    Brandstein, Michael S.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 515 - 526
  • [24] JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES
    Wang, Qing
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 101 - 105
  • [25] An efficient recurrent Rats function network (Rrfn) based speech enhancement through noise reduction
    V. Srinivasarao
    Multimedia Tools and Applications, 2022, 81 : 30599 - 30614
  • [26] An auditory-based adaptive speech enhancement system by neural network according to noise intensity
    Choi, J
    Okamoto, J
    Nakajima, S
    Suzuki, Y
    Hosokawa, S
    42ND MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, VOLS 1 AND 2, 1999, : 993 - 996
  • [27] Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses
    Shiva Gholami-Boroujeny
    Anwar Fallatah
    Brian P. Heffernan
    Hilmi R. Dajani
    Signal, Image and Video Processing, 2016, 10 : 389 - 395
  • [28] An efficient recurrent Rats function network (Rrfn) based speech enhancement through noise reduction
    Srinivasarao, V.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (21) : 30599 - 30614
  • [29] Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses
    Gholami-Boroujeny, Shiva
    Fallatah, Anwar
    Heffernan, Brian P.
    Dajani, Hilmi R.
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (02) : 389 - 395
  • [30] A Noise-type and Level-dependent MPO-based Speech Enhancement Architecture with Variable Frame Analysis for Noise-robust Speech Recognition
    Mitra, Vikramjit
    Borgstrom, Bengt J.
    Espy-Wilson, Carol Y.
    Alwan, Abeer
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2731 - +