End-to-End ASR with Adaptive Span Self-Attention

被引:3
|
作者
Chang, Xuankai [1 ]
Subramanian, Aswin Shanmugam [1 ]
Guo, Pengcheng [1 ,2 ]
Watanabe, Shinji [1 ]
Fujita, Yuya [3 ]
Omachi, Motoi [3 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[3] Yahoo Japan Corp, Tokyo, Japan
来源
关键词
Self-attention; adaptive; Transformer; end-to-end; speech recognition;
D O I
10.21437/Interspeech.2020-2816
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Transformers have demonstrated state-of-the-art performance on many tasks in natural language processing and speech processing. One of the key components in Transformers is self-attention, which attends to the whole input sequence at every layer. However, the computational and memory cost of self-attention is square of the input sequence length, which is a major concern in automatic speech recognition (ASR) where the input sequence can be very long. In this paper, we propose to use a technique called adaptive span self-attention for ASR tasks, which is originally proposed for language modeling. Our method enables the network to learn an appropriate size and position of the window for each layer and head, and our newly introduced scheme can further control the window size depending on the future and past contexts. Thus, it can save both computational complexity and memory size from the square order of the input length to the adaptive linear order. We show the effectiveness of the proposed method by using several ASR tasks, and the proposed adaptive span methods consistently improved the performance from the conventional fixed span methods.
引用
收藏
页码:3595 / 3599
页数:5
相关论文
共 50 条
  • [41] STREAMING BILINGUAL END-TO-END ASR MODEL USING ATTENTION OVER MULTIPLE SOFTMAX
    Patil, Aditya
    Joshi, Vikas
    Agrawal, Purvi
    Mehta, Rupesh
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 252 - 259
  • [42] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. INTERSPEECH 2021, 2021, : 2551 - 2555
  • [43] Contextual Biasing for End-to-End Chinese ASR
    Zhang, Kai
    Zhang, Qiuxia
    Wang, Chung-Che
    Jang, Jyh-Shing Roger
    [J]. IEEE ACCESS, 2024, 12 : 92960 - 92975
  • [44] Multi-Scale Visual Semantics Aggregation with Self-Attention for End-to-End Image-Text Matching
    Zheng, Zhuobin
    Ben, Youcheng
    Yuan, Chun
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 940 - 955
  • [45] End-to-End Topic Classification without ASR
    Dong, Zexian
    Liu, Jia
    Zhang, Wei-Qiang
    [J]. 2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [46] Phonemic competition in end-to-end ASR models
    ten Bosch, Louis
    Bentum, Martijn
    Boves, Lou
    [J]. INTERSPEECH 2023, 2023, : 586 - 590
  • [47] UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR
    Sivaraman, Ganesh
    Casal, Ricardo
    Garland, Matt
    Khoury, Elie
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6987 - 6991
  • [48] Hash Self-Attention End-to-End Network for Sketch-Based 3D Shape Retrieval
    Zhao X.
    Pan X.
    Liu F.
    Zhang S.
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (05): : 798 - 805
  • [49] UNSUPERVISED SPEAKER ADAPTATION USING ATTENTION-BASED SPEAKER MEMORY FOR END-TO-END ASR
    Sari, Leda
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7384 - 7388
  • [50] AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR
    Gao, Qiang
    Wu, Haiwei
    Sun, Yanqing
    Duan, Yitao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7253 - 7257