SELF-ATTENTION WITH RESTRICTED TIME CONTEXT AND RESOLUTION IN DNN SPEECH ENHANCEMENT

被引:0
|
作者
Strake, Maximilian [1 ]
Behlke, Adrian [1 ]
Fingscheidt, Tim [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, D-38106 Braunschweig, Germany
关键词
Speech enhancement; attention; fully convolutional networks; temporal modeling;
D O I
10.1109/IWAENC53105.2022.9914702
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The multi-head attention mechanism, which has been successfully applied in, e.g., machine translation and ASR, was also found to be a promising approach for temporal modeling in speech enhancement DNNs. Since speech enhancement can be expected to take less profit from long-term temporal context than machine translation or ASR, we propose to employ self-attention with modified context access. We first show that a restriction of the employed temporal context in the self-attention layers of a CNN-based network architecture is crucial for good speech enhancement performance. Furthermore, we propose to combine restricted attention with a subsampled attention variant that considers long-term context with a lower temporal resolution, which helps to effectively consider both long- and short-term context. We show that our proposed attention-based network outperforms similar networks using RNNs for temporal modeling as well as a strong reference method using unrestricted attention.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] DNN-based speech enhancement with self-attention on feature dimension
    Cheng, Jiaming
    Liang, Ruiyu
    Zhao, Li
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (43-44) : 32449 - 32470
  • [2] DNN-based speech enhancement with self-attention on feature dimension
    Jiaming Cheng
    Ruiyu Liang
    Li Zhao
    [J]. Multimedia Tools and Applications, 2020, 79 : 32449 - 32470
  • [3] Dense CNN With Self-Attention for Time-Domain Speech Enhancement
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1270 - 1279
  • [4] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [5] SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT
    Huy Phan
    Nguyen, Huy Le
    Chen, Oliver Y.
    Koch, Philipp
    Duong, Ngoc Q. K.
    McLoughlin, Ian
    Mertins, Alfred
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7103 - 7107
  • [6] A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR
    Povey, Daniel
    Hadian, Hossein
    Ghahremani, Pegah
    Li, Ke
    Khudanpur, Sanjeev
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5874 - 5878
  • [7] Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement
    Oostermeijer, Koen
    Wang, Qing
    Du, Jun
    [J]. INTERSPEECH 2021, 2021, : 2831 - 2835
  • [8] END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION
    Sharma, Roshan
    Palaskar, Shruti
    Black, Alan W.
    Metze, Florian
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8072 - 8076
  • [9] CAPTURING MULTI-RESOLUTION CONTEXT BY DILATED SELF-ATTENTION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5869 - 5873
  • [10] Masked multi-head self-attention for causal speech enhancement
    Nicolson, Aaron
    Paliwal, Kuldip K.
    [J]. SPEECH COMMUNICATION, 2020, 125 : 80 - 96