CAPTURING MULTI-RESOLUTION CONTEXT BY DILATED SELF-ATTENTION

被引:6
|
作者
Moritz, Niko [1 ]
Hori, Takaaki [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA
关键词
dilated self-attention; transformer; automatic speech recognition; computational complexity;
D O I
10.1109/ICASSP39728.2021.9415001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-attention has become an important and widely used neural network component that helped to establish new state-of-the-art results for various applications, such as machine translation and automatic speech recognition (ASR). However, the computational complexity of self-attention grows quadratically with the input sequence length. This can be particularly problematic for applications such as ASR, where an input sequence generated from an utterance can be relatively long. In this work, we propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention. The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution. Different methods for summarizing distant frames are studied, such as subsampling, mean-pooling, and attention-based pooling. ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.
引用
下载
收藏
页码:5869 / 5873
页数:5
相关论文
共 50 条
  • [1] Multi Resolution Analysis (MRA) for Approximate Self-Attention
    Zeng, Zhanpeng
    Pal, Sourav
    Kline, Jeffery
    Fung, Glenn
    Singh, Vikas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] SELF-ATTENTION WITH RESTRICTED TIME CONTEXT AND RESOLUTION IN DNN SPEECH ENHANCEMENT
    Strake, Maximilian
    Behlke, Adrian
    Fingscheidt, Tim
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [3] Capturing and presenting shared multi-resolution video
    Kimber, D
    Qiong, L
    Foote, J
    Wilcox, L
    INTERNET MULTIMEDIA MANAGEMENT SYSTEMS III, 2002, 4862 : 261 - 271
  • [4] Multi-feature self-attention super-resolution network
    Yang, Aiping
    Wei, Zihao
    Wang, Jinbin
    Cao, Jiale
    Ji, Zhong
    Pang, Yanwei
    VISUAL COMPUTER, 2024, 40 (05): : 3473 - 3486
  • [5] Multi-feature self-attention super-resolution network
    Aiping Yang
    Zihao Wei
    Jinbin Wang
    Jiale Cao
    Zhong Ji
    Yanwei Pang
    The Visual Computer, 2024, 40 : 3473 - 3486
  • [6] DILATED RESIDUAL NETWORK WITH MULTI-HEAD SELF-ATTENTION FOR SPEECH EMOTION RECOGNITION
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Zhao, Sheng
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6675 - 6679
  • [7] Context-Aware Self-Attention Networks
    Yang, Baosong
    Li, Jian
    Wong, Derek F.
    Chao, Lidia S.
    Wang, Xing
    Tu, Zhaopeng
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 387 - 394
  • [9] A Serial-Parallel Self-Attention Network Joint With Multi-Scale Dilated Convolution
    Gaihua, Wang
    Tianlun, Zhang
    Yingying, Dai
    Jinheng, Lin
    Lei, Cheng
    IEEE ACCESS, 2021, 9 : 71909 - 71919
  • [10] SELF-ATTENTION FOR AUDIO SUPER-RESOLUTION
    Rakotonirina, Nathanael Carraz
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,