CAPTURING MULTI-RESOLUTION CONTEXT BY DILATED SELF-ATTENTION

被引:6
|
作者
Moritz, Niko [1 ]
Hori, Takaaki [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA
关键词
dilated self-attention; transformer; automatic speech recognition; computational complexity;
D O I
10.1109/ICASSP39728.2021.9415001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-attention has become an important and widely used neural network component that helped to establish new state-of-the-art results for various applications, such as machine translation and automatic speech recognition (ASR). However, the computational complexity of self-attention grows quadratically with the input sequence length. This can be particularly problematic for applications such as ASR, where an input sequence generated from an utterance can be relatively long. In this work, we propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention. The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution. Different methods for summarizing distant frames are studied, such as subsampling, mean-pooling, and attention-based pooling. ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.
引用
下载
收藏
页码:5869 / 5873
页数:5
相关论文
共 50 条
  • [11] Adaptive Multi-Resolution Attention with Linear Complexity
    Zhang, Yao
    Ma, Yunpu
    Seidl, Thomas
    Tresp, Volker
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [12] Multi-Resolution Attention for Personalized Item Search
    Kocayusufoglu, Furkan
    Wu, Tao
    Singh, Anima
    Roumpos, Georgios
    Cheng, Heng-Tze
    Jain, Sagar
    Chi, Ed
    Singh, Ambuj
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 508 - 516
  • [13] Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation
    Tang, Hao
    Liu, Xingwei
    Han, Kun
    Xie, Xiaohui
    Chen, Xuming
    Qian, Huang
    Liu, Yong
    Sun, Shanlin
    Bai, Narisu
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 938 - 948
  • [14] Video Dialog via Multi-Grained Convolutional Self-Attention Context Networks
    Jin, Weike
    Zhao, Zhou
    Gu, Mao
    Yu, Jun
    Xiao, Jun
    Zhuang, Yueting
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 465 - 474
  • [15] A Bearing Fault Diagnosis Method Based on Dilated Convolution and Multi-Head Self-Attention Mechanism
    Hou, Peng
    Zhang, Jianjie
    Jiang, Zhangzheng
    Tang, Yiyu
    Lin, Ying
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [16] Context Matters: Self-Attention for Sign Language Recognition
    Slimane, Fares Ben
    Bouguessa, Mohamed
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7884 - 7891
  • [17] Jump Self-attention: Capturing High-order Statistics in Transformers
    Zhou, Haoyi
    Xiao, Siyang
    Zhang, Shanghang
    Peng, Jieqi
    Zhang, Shuai
    Li, Jianxin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [18] Multi-view self-attention networks
    Xu, Mingzhou
    Yang, Baosong
    Wong, Derek F.
    Chao, Lidia S.
    KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [19] IMAGE SUPER-RESOLUTION USING MULTI-RESOLUTION ATTENTION NETWORK
    Liu, Anqi
    Li, Sumei
    Chang, Yongli
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1610 - 1614
  • [20] Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks
    Gu, Mao
    Zhao, Zhou
    Jin, Weike
    Cai, Deng
    Wu, Fei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4453 - 4466