A Novel Low-Complexity Attention-Driven Composite Model for Speech Enhancement

被引:1
|
作者
Hasannezhad, Mojtaba [1 ]
Zhu, Wei-Ping [1 ]
Champagne, Benoit [2 ]
机构
[1] Concordia Univ, Elect & Comp Engn, Montreal, PQ, Canada
[2] McGill Univ, Elect & Comp Engn, Montreal, PQ, Canada
关键词
speech enhancement; dilated convolution; grouping strategy; attention technique; low complexity; NEURAL-NETWORK; TIME;
D O I
10.1109/ISCAS51556.2021.9401385
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech exhibits strong dependencies among its samples in both time and frequency domains. In this paper, we propose a low-complexity composite model for speech enhancement (SE) that integrates a convolutional neural network (CNN) and a long short-term memory (LSTM) network. These two modules take full advantage of the spectral and temporal information of input speech and extract in parallel a complementary set of features. The CNN is enabled to capture non-local spectral information via dilated frequency convolutions. It also incorporates an attention mechanism to recalibrate its weights without imposing considerable additional complexity. A grouping strategy is adopted for LSTM implementation to reduce its complexity while keeping performance almost unchanged. Our composite model is carefully designed to address concerns in real-time applications including limited computational resources, low-latency processing, and causal architecture. Through extensive and comparative simulation studies, it is shown that the proposed model significantly outperforms some other DNN-based SE methods in the recent literature.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Adaptive Attention-driven Speech Enhancement for EEG-informed Hearing Prostheses
    Das, Neetha
    Van Eyndhoven, Simon
    Francart, Tom
    Bertrand, Alexander
    [J]. 2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 77 - 80
  • [2] A MODEL OF ATTENTION-DRIVEN SCENE ANALYSIS
    Slaney, Malcolm
    Agus, Trevor
    Liu, Shih-Chii
    Kaya, Merve
    Elhilali, Mounya
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 145 - 148
  • [3] Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement
    Ge, Meng
    Wang, Longbiao
    Li, Nan
    Shi, Hao
    Dang, Jianwu
    Li, Xiangang
    [J]. INTERSPEECH 2019, 2019, : 3153 - 3157
  • [4] Divide and Conquer: A Low-complexity Neural Network for Monophonic Speech Enhancement
    Fang, Bingxiao
    Liu, Liang
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 944 - 949
  • [5] A deep attention-driven model to forecast solar irradiance
    Dairi, Abdelkader
    Harrou, Fouzi
    Sun, Ying
    [J]. 2021 IEEE 19TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2021,
  • [6] Attention-driven Factor Model for Explainable Personalized Recommendation
    Chen, Jingwu
    Zhuang, Fuzhen
    Hong, Xin
    Ao, Xiang
    Xie, Xing
    He, Qing
    [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 909 - 912
  • [7] Parallel Attention-Driven Model for Student Performance Evaluation
    Olaniyan, Deborah
    Olaniyan, Julius
    Obagbuwa, Ibidun Christiana
    Esiefarienrhe, Bukohwo Michael
    Bernard, Olorunfemi Paul
    [J]. COMPUTERS, 2024, 13 (09)
  • [8] A novel content-based image retrieval approach based on attention-driven model
    Lu, Ying-Hua
    Zhang, Xiao-Hua
    Kong, Jun
    Wang, Xue-Feng
    [J]. 2007 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, VOLS 1-4, PROCEEDINGS, 2007, : 510 - 515
  • [9] SPEECH ENHANCEMENT WITH A LOW-COMPLEXITY ONLINE SOURCE NUMBER ESTIMATOR USING DISTRIBUTED ARRAYS
    Taseska, Maja
    Khan, Affan Hasan
    Habets, Emanuel A. P.
    [J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 929 - 933
  • [10] Low-complexity, nonintrusive speech quality assessment
    Grancharov, Volodya
    Zhao, David Y.
    Lindblom, Jonas
    Kleijn, W. Bastiaan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06): : 1948 - 1956