CAPTURING MULTI-RESOLUTION CONTEXT BY DILATED SELF-ATTENTION

被引:6
|
作者
Moritz, Niko [1 ]
Hori, Takaaki [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA
关键词
dilated self-attention; transformer; automatic speech recognition; computational complexity;
D O I
10.1109/ICASSP39728.2021.9415001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-attention has become an important and widely used neural network component that helped to establish new state-of-the-art results for various applications, such as machine translation and automatic speech recognition (ASR). However, the computational complexity of self-attention grows quadratically with the input sequence length. This can be particularly problematic for applications such as ASR, where an input sequence generated from an utterance can be relatively long. In this work, we propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention. The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution. Different methods for summarizing distant frames are studied, such as subsampling, mean-pooling, and attention-based pooling. ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.
引用
下载
收藏
页码:5869 / 5873
页数:5
相关论文
共 50 条
  • [21] Multi-entity sentiment analysis using self-attention based hierarchical dilated convolutional neural network
    Gan, Chenquan
    Wang, Lu
    Zhang, Zufan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 112 : 116 - 125
  • [22] MULTI-RESOLUTION MULTI-HEAD ATTENTION IN DEEP SPEAKER EMBEDDING
    Wang, Zhiming
    Yao, Kaisheng
    Li, Xiaolong
    Fang, Shuo
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6464 - 6468
  • [23] Multi-resolution extension for transmission of geodata in a mobile context
    Follin, JM
    Bouju, A
    Bertrand, F
    Boursier, P
    COMPUTERS & GEOSCIENCES, 2005, 31 (02) : 179 - 188
  • [24] Multi-Head Self-Attention Gated-Dilated Convolutional Neural Network for Word Sense Disambiguation
    Zhang, Chun-Xiang
    Zhang, Yu-Long
    Gao, Xue-Yao
    IEEE ACCESS, 2023, 11 : 14202 - 14210
  • [25] Context-embedded hypergraph attention network and self-attention for session recommendation
    Zhang, Zhigao
    Zhang, Hongmei
    Zhang, Zhifeng
    Wang, Bin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] Optimal Deep Multi-Route Self-Attention for Single Image Super-Resolution
    Ngambenjavichaikul, Nisawan
    Chen, Sovann
    Aramvith, Supavadee
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1181 - 1186
  • [27] Dual Attention with the Self-Attention Alignment for Efficient Video Super-resolution
    Chu, Yuezhong
    Qiao, Yunan
    Liu, Heng
    Han, Jungong
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1140 - 1151
  • [28] Dialogue Act Classification with Context-Aware Self-Attention
    Raheja, Vipul
    Tetreault, Joel
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3727 - 3733
  • [29] Dual Attention with the Self-Attention Alignment for Efficient Video Super-resolution
    Yuezhong Chu
    Yunan Qiao
    Heng Liu
    Jungong Han
    Cognitive Computation, 2022, 14 : 1140 - 1151
  • [30] Multi-Scale Self-Attention for Text Classification
    Guo, Qipeng
    Qiu, Xipeng
    Liu, Pengfei
    Xue, Xiangyang
    Zhang, Zheng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7847 - 7854