A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR

被引:0
|
作者
Povey, Daniel [1 ,2 ]
Hadian, Hossein [1 ]
Ghahremani, Pegah [1 ]
Li, Ke [1 ]
Khudanpur, Sanjeev [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
基金
美国国家科学基金会;
关键词
ASR; attention; lattice-free MMI; neural network; LSTM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-attention - an attention mechanism where the input and output sequence lengths are the same - has recently been successfully applied to machine translation, caption generation, and phoneme recognition. In this paper we apply a restricted self-attention mechanism (with multiple heads) to speech recognition. By "restricted" we mean that the mechanism at a particular frame only sees input from a limited number of frames to the left and right. Restricting the context makes it easier to encode the position of the input - we use a 1-hot encoding of the frame offset. We try introducing attention layers into TDNN architectures, and replacing LSTM layers with attention layers in TDNN+LSTM architectures. We show experiments on a number of ASR setups. We observe improvements compared to the TDNN and TDNN+LSTM baselines. Attention layers are also faster than LSTM layers in test time, since they lack recurrence.
引用
收藏
页码:5874 / 5878
页数:5
相关论文
共 50 条
  • [1] Improving Hybrid CTC/Attention Architecture with Time-Restricted Self-Attention CTC for End-to-End Speech Recognition
    Wu, Long
    Li, Ta
    Wang, Li
    Yan, Yonghong
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (21):
  • [2] SELF-ATTENTION WITH RESTRICTED TIME CONTEXT AND RESOLUTION IN DNN SPEECH ENHANCEMENT
    Strake, Maximilian
    Behlke, Adrian
    Fingscheidt, Tim
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [3] End-to-End ASR with Adaptive Span Self-Attention
    Chang, Xuankai
    Subramanian, Aswin Shanmugam
    Guo, Pengcheng
    Watanabe, Shinji
    Fujita, Yuya
    Omachi, Motoi
    [J]. INTERSPEECH 2020, 2020, : 3595 - 3599
  • [4] SELF-ATTENTION ALIGNER: A LATENCY-CONTROL END-TO-END MODEL FOR ASR USING SELF-ATTENTION NETWORK AND CHUNK-HOPPING
    Dong, Linhao
    Wang, Feng
    Xu, Bo
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5656 - 5660
  • [5] Self-attention with Functional Time Representation Learning
    Xu, Da
    Ruan, Chuanwei
    Kumar, Sushant
    Korpeoglu, Evren
    Achan, Kannan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Self-attention CNN for retinal layer segmentation in OCT
    Cao, Guogang
    Wu, Yan
    Peng, Zeyu
    Zhou, Zhilin
    Dai, Cuixia
    [J]. BIOMEDICAL OPTICS EXPRESS, 2024, 15 (03) : 1605 - 1617
  • [7] Revolutionizing Time Series Data Preprocessing with a Novel Cycling Layer in Self-Attention Mechanisms †
    Chen, Jiyan
    Yang, Zijiang
    [J]. Applied Sciences (Switzerland), 2024, 14 (19):
  • [8] END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION
    Sharma, Roshan
    Palaskar, Shruti
    Black, Alan W.
    Metze, Florian
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8072 - 8076
  • [9] The benefits of time-restricted eating
    Anna Kriebs
    [J]. Nature Reviews Endocrinology, 2020, 16 : 68 - 68
  • [10] Measurable time-restricted sensitivity
    Aiello, Domenico
    Diao, Hansheng
    Fan, Zhou
    King, Daniel O.
    Lin, Jessica
    Silva, Cesar E.
    [J]. NONLINEARITY, 2012, 25 (12) : 3313 - 3325