Multi-stage temporal representation learning via global and local perspectives for real-time speech enhancement

被引:0
|
作者
Chau, Hoang Ngoc [1 ]
Linh, Nguyen Thi Nhat [1 ]
Doan, Tuan Kiet [1 ]
Nguyen, Quoc Cuong [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi 100000, Vietnam
关键词
Speech enhancement; Deep learning-based; Global and local modeling; Self-attention; Graph convolution; NEURAL-NETWORK; DOMAIN; BEAMFORMER; ATTENTION;
D O I
10.1016/j.apacoust.2024.110067
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning-based speech enhancement algorithms have been rapidly developed over the past few years. Although numerous approaches have been proposed, global and local information from speech features have not been thoroughly investigated. In this paper, we introduce a novel and highly effective speech enhancement network called Multi-stage Global-Local Network (MSGLN), which exploits both local and global information via temporal self-attention, temporal graph convolution, and 1D convolution. Local modeling blocks capture the fast changes in speech signals, while global modeling blocks learn long-term trends in noise or speech signals through factors such as pitch, tone, resonance, timbre, and rhythm. In addition, we propose a multi-stage temporal processing module as the bottleneck of a complex convolutional encoder-decoder structure to guide our network to learn different acoustic structures from different scales. Then a dual-path RNN postprocessing module is integrated to reconstruct the speech spectrum mask using a frequency-wise temporal refinement block followed by a frame-wise spectral refinement block. Experimental results demonstrate the superior performance of our proposed methodology compared to other state-of-the-arts on both real-time single- and multi-channel speech enhancement tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Multi-stage real time health monitoring via ZigBee in smart homes
    Dagtas, S.
    Pekhteryev, G.
    Sahinoglu, Z.
    [J]. 21ST INTERNATIONAL CONFERENCE ON ADVANCED NETWORKING AND APPLICATIONS WORKSHOPS/SYMPOSIA, VOL 2, PROCEEDINGS, 2007, : 782 - +
  • [22] Global-local multi-stage temporal convolutional network for cataract surgery phase recognition
    Fang, Lixin
    Mou, Lei
    Gu, Yuanyuan
    Hu, Yan
    Chen, Bang
    Chen, Xu
    Wang, Yang
    Liu, Jiang
    Zhao, Yitian
    [J]. BIOMEDICAL ENGINEERING ONLINE, 2022, 21 (01)
  • [23] Learning motion representation for real-time spatio-temporal action localization
    Zhang, Dejun
    He, Linchao
    Tu, Zhigang
    Zhang, Shifu
    Han, Fei
    Yang, Boxiong
    [J]. PATTERN RECOGNITION, 2020, 103
  • [24] Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement
    Lee, Jinyoung
    Kang, Hong-Goo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2188 - 2192
  • [25] Rolling Horizon Robust Real-Time Economic Dispatch with Multi-Stage Dynamic Modeling
    Wang, Luyu
    Xiong, Houbo
    Shi, Yunhui
    Guo, Chuangxin
    [J]. MATHEMATICS, 2023, 11 (11)
  • [26] REAL-TIME HAND DETECTION BASED ON MULTI-STAGE HOG-SVM CLASSIFIER
    Guo, Jiang
    Cheng, Jun
    Pang, Jianxin
    Guo, Yu
    [J]. 2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 4108 - 4111
  • [27] Verification of Real-Time Optimization for Multi-stage Spray Dryer Operation with Polynomial Optimization
    Miklos, Robert
    Petersen, Lars Norbert
    Poulsen, Niels Kjolstad
    Utzen, Christer
    Jorgensen, John Bagterp
    Niemann, Hans Henrik
    [J]. 2018 IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (CCTA), 2018, : 502 - 507
  • [28] Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement
    Oostermeijer, Koen
    Wang, Qing
    Du, Jun
    [J]. INTERSPEECH 2021, 2021, : 2831 - 2835
  • [29] A Multi-Stage Deep-Learning-Based Vehicle and License Plate Recognition System with Real-Time Edge Inference
    Ammar, Adel
    Koubaa, Anis
    Boulila, Wadii
    Benjdira, Bilel
    Alhabashi, Yasser
    [J]. SENSORS, 2023, 23 (04)
  • [30] Decomposition-based real-time control of multi-stage transfer lines with residence time constraints
    Wang, Feifan
    Ju, Feng
    [J]. IISE TRANSACTIONS, 2021, 53 (09) : 943 - 959