Multi-stage temporal representation learning via global and local perspectives for real-time speech enhancement

被引:0
|
作者
Chau, Hoang Ngoc [1 ]
Linh, Nguyen Thi Nhat [1 ]
Doan, Tuan Kiet [1 ]
Nguyen, Quoc Cuong [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi 100000, Vietnam
关键词
Speech enhancement; Deep learning-based; Global and local modeling; Self-attention; Graph convolution; NEURAL-NETWORK; DOMAIN; BEAMFORMER; ATTENTION;
D O I
10.1016/j.apacoust.2024.110067
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning-based speech enhancement algorithms have been rapidly developed over the past few years. Although numerous approaches have been proposed, global and local information from speech features have not been thoroughly investigated. In this paper, we introduce a novel and highly effective speech enhancement network called Multi-stage Global-Local Network (MSGLN), which exploits both local and global information via temporal self-attention, temporal graph convolution, and 1D convolution. Local modeling blocks capture the fast changes in speech signals, while global modeling blocks learn long-term trends in noise or speech signals through factors such as pitch, tone, resonance, timbre, and rhythm. In addition, we propose a multi-stage temporal processing module as the bottleneck of a complex convolutional encoder-decoder structure to guide our network to learn different acoustic structures from different scales. Then a dual-path RNN postprocessing module is integrated to reconstruct the speech spectrum mask using a frequency-wise temporal refinement block followed by a frame-wise spectral refinement block. Experimental results demonstrate the superior performance of our proposed methodology compared to other state-of-the-arts on both real-time single- and multi-channel speech enhancement tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] DENSELY CONNECTED MULTI-STAGE MODEL WITH CHANNEL WISE SUBBAND FEATURE FOR REAL-TIME SPEECH ENHANCEMENT
    Li, Jingdong
    Luo, Dawei
    Liu, Yun
    Zhu, Yuanyuan
    Li, Zhaoxia
    Cui, Guohui
    Tang, Wenqi
    Chen, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6638 - 6642
  • [2] Enhancing Feature Representation for Anomaly Detection via Local-and-Global Temporal Relations and a Multi-stage Memory
    Li, Xuan
    Ma, Ding
    Wu, Xiangqian
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 121 - 133
  • [3] Real-Time Text Steganalysis Based on Multi-Stage Transfer Learning
    Peng, Wanli
    Zhang, Jinyu
    Xue, Yiming
    Yang, Zhenghong
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1510 - 1514
  • [4] Multi-stage Progressive Learning-Based Speech Enhancement Using Time–Frequency Attentive Squeezed Temporal Convolutional Networks
    Chaitanya Jannu
    Sunny Dayal Vanambathina
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 7467 - 7493
  • [5] Multi-stage Progressive Learning-Based Speech Enhancement Using Time-Frequency Attentive Squeezed Temporal Convolutional Networks
    Jannu, Chaitanya
    Vanambathina, Sunny Dayal
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (12) : 7467 - 7493
  • [6] A new algorithm for real-time multi-stage image thresholding
    Lin, SM
    Giesen, R
    Nair, D
    [J]. MACHINE VISION APPLICATIONS IN INDUSTRIAL INSPECTION XIV, 2006, 6070
  • [7] Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks
    Lin, Ju
    van Wijngaarden, Adriaan J. de Lind
    Wang, Kuang-Ching
    Smith, Melissa C.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3440 - 3450
  • [8] Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility
    Koutsogiannaki, Maria
    Francois, Holly
    Choo, Kihyun
    Oh, Eunmi
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1973 - 1977
  • [9] Real-time Local Feature with Global Visual Information Enhancement
    Miao, Jinyu
    Yue, Haosong
    Liu, Zhong
    Wu, Xingming
    Fang, Zaojun
    Yang, Guilin
    [J]. 2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 189 - 194
  • [10] MULTI-STAGE GRAPH REPRESENTATION LEARNING FOR DIALOGUE-LEVEL SPEECH EMOTION RECOGNITION
    Song, Yaodong
    Liu, Jiaxing
    Wang, Longbiao
    Yu, Ruiguo
    Dang, Jianwu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6432 - 6436