CAPTURING MULTI-RESOLUTION CONTEXT BY DILATED SELF-ATTENTION

被引：6

作者：

Moritz, Niko ^{[1
]}

Hori, Takaaki ^{[1
]}

Le Roux, Jonathan ^{[1
]}

机构：

[1] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

dilated self-attention; transformer; automatic speech recognition; computational complexity;

D O I：

10.1109/ICASSP39728.2021.9415001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Self-attention has become an important and widely used neural network component that helped to establish new state-of-the-art results for various applications, such as machine translation and automatic speech recognition (ASR). However, the computational complexity of self-attention grows quadratically with the input sequence length. This can be particularly problematic for applications such as ASR, where an input sequence generated from an utterance can be relatively long. In this work, we propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention. The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution. Different methods for summarizing distant frames are studied, such as subsampling, mean-pooling, and attention-based pooling. ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.

引用

下载

页码：5869 / 5873

页数：5

共 50 条

[11] Adaptive Multi-Resolution Attention with Linear Complexity
Zhang, Yao
Ma, Yunpu
Seidl, Thomas
Tresp, Volker
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[12] Multi-Resolution Attention for Personalized Item Search
Kocayusufoglu, Furkan
Wu, Tao
Singh, Anima
Roumpos, Georgios
Cheng, Heng-Tze
Jain, Sagar
Chi, Ed
Singh, Ambuj
WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 508 - 516
[13] Spatial Context-Aware Self-Attention Model For Multi-Organ Segmentation
Tang, Hao
Liu, Xingwei
Han, Kun
Xie, Xiaohui
Chen, Xuming
Qian, Huang
Liu, Yong
Sun, Shanlin
Bai, Narisu
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 938 - 948
[14] Video Dialog via Multi-Grained Convolutional Self-Attention Context Networks
Jin, Weike
Zhao, Zhou
Gu, Mao
Yu, Jun
Xiao, Jun
Zhuang, Yueting
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 465 - 474
[15] A Bearing Fault Diagnosis Method Based on Dilated Convolution and Multi-Head Self-Attention Mechanism
Hou, Peng
Zhang, Jianjie
Jiang, Zhangzheng
Tang, Yiyu
Lin, Ying
APPLIED SCIENCES-BASEL, 2023, 13 (23):
[16] Context Matters: Self-Attention for Sign Language Recognition
Slimane, Fares Ben
Bouguessa, Mohamed
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7884 - 7891
[17] Jump Self-attention: Capturing High-order Statistics in Transformers
Zhou, Haoyi
Xiao, Siyang
Zhang, Shanghang
Peng, Jieqi
Zhang, Shuai
Li, Jianxin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[18] Multi-view self-attention networks
Xu, Mingzhou
Yang, Baosong
Wong, Derek F.
Chao, Lidia S.
KNOWLEDGE-BASED SYSTEMS, 2022, 241
[19] IMAGE SUPER-RESOLUTION USING MULTI-RESOLUTION ATTENTION NETWORK
Liu, Anqi
Li, Sumei
Chang, Yongli
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1610 - 1614
[20] Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks
Gu, Mao
Zhao, Zhou
Jin, Weike
Cai, Deng
Wu, Fei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4453 - 4466

← 1 2 3 4 5 →