End-to-End ASR with Adaptive Span Self-Attention

被引:3
|
作者
Chang, Xuankai [1 ]
Subramanian, Aswin Shanmugam [1 ]
Guo, Pengcheng [1 ,2 ]
Watanabe, Shinji [1 ]
Fujita, Yuya [3 ]
Omachi, Motoi [3 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[3] Yahoo Japan Corp, Tokyo, Japan
来源
关键词
Self-attention; adaptive; Transformer; end-to-end; speech recognition;
D O I
10.21437/Interspeech.2020-2816
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Transformers have demonstrated state-of-the-art performance on many tasks in natural language processing and speech processing. One of the key components in Transformers is self-attention, which attends to the whole input sequence at every layer. However, the computational and memory cost of self-attention is square of the input sequence length, which is a major concern in automatic speech recognition (ASR) where the input sequence can be very long. In this paper, we propose to use a technique called adaptive span self-attention for ASR tasks, which is originally proposed for language modeling. Our method enables the network to learn an appropriate size and position of the window for each layer and head, and our newly introduced scheme can further control the window size depending on the future and past contexts. Thus, it can save both computational complexity and memory size from the square order of the input length to the adaptive linear order. We show the effectiveness of the proposed method by using several ASR tasks, and the proposed adaptive span methods consistently improved the performance from the conventional fixed span methods.
引用
收藏
页码:3595 / 3599
页数:5
相关论文
共 50 条
  • [1] SELF-ATTENTION ALIGNER: A LATENCY-CONTROL END-TO-END MODEL FOR ASR USING SELF-ATTENTION NETWORK AND CHUNK-HOPPING
    Dong, Linhao
    Wang, Feng
    Xu, Bo
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5656 - 5660
  • [2] Self-Attention Transducers for End-to-End Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Bai, Ye
    Wen, Zhengqi
    [J]. INTERSPEECH 2019, 2019, : 4395 - 4399
  • [3] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
  • [4] END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION
    Sharma, Roshan
    Palaskar, Shruti
    Black, Alan W.
    Metze, Florian
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8072 - 8076
  • [5] Efficient decoding self-attention for end-to-end speech synthesis
    Zhao, Wei
    Xu, Li
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (07) : 1127 - 1138
  • [6] End-to-End Learning for Video Frame Compression with Self-Attention
    Zou, Nannan
    Zhang, Honglei
    Cricri, Francesco
    Tavakoli, Hamed R.
    Lainema, Jani
    Aksu, Emre
    Hannuksela, Miska
    Rahtu, Esa
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 580 - 584
  • [7] On the localness modeling for the self-attention based end-to-end speech synthesis
    Yang, Shan
    Lu, Heng
    Kang, Shiyin
    Xue, Liumeng
    Xiao, Jinba
    Su, Dan
    Xie, Lei
    Yu, Dong
    [J]. Neural Networks, 2020, 125 : 121 - 130
  • [8] Very Deep Self-Attention Networks for End-to-End Speech Recognition
    Ngoc-Quan Pham
    Thai-Son Nguyen
    Niehues, Jan
    Mueller, Markus
    Waibel, Alex
    [J]. INTERSPEECH 2019, 2019, : 66 - 70
  • [9] End-to-end Parking Behavior Recognition Based on Self-attention Mechanism
    Li, Penghua
    Zhu, Dechen
    Mou, Qiyun
    Tu, Yushan
    Wu, Jinfeng
    [J]. 2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 371 - 376
  • [10] On the localness modeling for the self-attention based end-to-end speech synthesis
    Yang, Shan
    Lu, Heng
    Kang, Shiyin
    Xue, Liumeng
    Xiao, Jinba
    Su, Dan
    Xie, Lei
    Yu, Dong
    [J]. NEURAL NETWORKS, 2020, 125 : 121 - 130