An End-to-End Speech Enhancement Method Combining Attention Mechanism to Improve GAN

被引:0
|
作者
Chen, Wei [1 ]
Cai, Yichao [1 ]
Yang, Qingyu [1 ]
Wang, Ge [1 ]
Liu, Taian [1 ]
Liu, Xinying [1 ]
机构
[1] Shandong Univ Sci & Technol, Coll Intelligent Equipment, Tai An, Peoples R China
关键词
Generative Adversarial Networks; time series; attention mechanisms; SEGAN; PESQ; STOI; NOISE; SUPPRESSION; NETWORKS;
D O I
10.1109/IAEAC54830.2022.9929534
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current Generative Adversarial Networks only rely on convolution operations when dealing with speech tasks, ignoring the dependencies between time series and have limited learning ability so that there is still obvious residual noise in the enhanced speech. To solve this problem, an end-to-end speech enhancement method combining attention mechanisms to improve GAN is proposed to apply a combined attention mechanism fusing channel and space between convolutional layers of SEGAN to obtain more contextual information of speech in both channel and space dimensions and extract more accurate feature information. Experimental results demonstrate that the method outperforms the baseline model in both speech quality and intelligibility. The experimental data show that under different signal-to-noise ratios, the perceptual speech quality assessment (PESQ) is improved by an average of 25.72%, and the objective short-term object intelligibility (STOI) is improved by an average of 1.68%.
引用
收藏
页码:538 / 542
页数:5
相关论文
共 50 条
  • [41] Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
    Liu, Bin
    Nie, Shuai
    Liang, Shan
    Liu, Wenju
    Yu, Meng
    Chen, Lianwu
    Peng, Shouye
    Li, Changliang
    INTERSPEECH 2019, 2019, : 491 - 495
  • [42] Self-Supervised Representations Improve End-to-End Speech Translation
    Wu, Anne
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 1491 - 1495
  • [43] AN END-TO-END MULTITASK LEARNING MODEL TO IMPROVE SPEECH EMOTION RECOGNITION
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 351 - 355
  • [44] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [45] CLASS-CONDITIONAL DEFENSE GAN AGAINST END-TO-END SPEECH ATTACKS
    Esmaeilpour, Mohammad
    Cardinal, Patrick
    Koerich, Alessandro Lameiras
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2565 - 2569
  • [46] An End-to-end Deep Clustering Method with Consistency and Complementarity Attention Mechanism for Multisensor Fault Diagnosis
    Wu, Zhangjun
    Fang, Gang
    Wang, Yifei
    Xu, Renli
    APPLIED SOFT COMPUTING, 2024, 158
  • [47] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368
  • [48] End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
    Kim, Suyoun
    Lane, Ian
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3867 - 3871
  • [49] EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 913 - 919
  • [50] A Dual-Channel End-to-End Speech Enhancement Method Using Complex Operations in the Time Domain
    Pang, Jian
    Li, Hongcheng
    Jiang, Tao
    Wang, Hui
    Liao, Xiangning
    Luo, Le
    Liu, Hongqing
    APPLIED SCIENCES-BASEL, 2023, 13 (13):