Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

被引:0
|
作者
Zhao, Chendong [1 ,2 ]
Wang, Jianzong [1 ]
Wei, Wenqi [1 ]
Qu, Xiaoyang [1 ]
Wang, Haoqian [2 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[2] Tsinghua Univ, Shenzhen Int Grad Sch, Beijing, Peoples R China
关键词
Automatic Speech Recognition; Sparse Attention; Monotonic Attention; Self-Attention;
D O I
10.1109/DSAA54385.2022.10032360
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Transformer architecture model, based on selfattention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR). However, self-attention and multi-head attention cannot be easily applied for streaming or online ASR. For self-attention in Transformer ASR, the softmax normalization function-based attention mechanism makes it impossible to highlight important speech information. For multi-head attention in Transformer ASR, it is not easy to model monotonic alignments in different heads. To overcome these two limits, we integrate sparse attention and monotonic attention into Transformer-based ASR. The sparse mechanism introduces a learned sparsity scheme to enable each self-attention structure to fit the corresponding head better. The monotonic attention deploys regularization to prune redundant heads for the multi-head attention structure. The experiments show that our method can effectively improve the attention mechanism on widely used benchmarks of speech recognition.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 50 条
  • [1] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [2] A window attention based Transformer for Automatic Speech Recognition
    Feng, Zhao
    Li, Yongming
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 449 - 454
  • [3] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
    Zeyer, Albert
    Schmitt, Robin
    Zhou, Wei
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
  • [4] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
    Dong, Fang
    Qian, Yiyang
    Wang, Tianlei
    Liu, Peng
    Cao, Jiuwen
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
  • [5] A transformer-based network for speech recognition
    Tang L.
    International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
  • [6] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
    Hadwan, Mohammed
    Alsayadi, Hamzah A.
    AL-Hagree, Salah
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
  • [7] Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
    Lehecka, Jan
    Psutka, Josef, V
    Psutka, Josef
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 301 - 312
  • [8] Monotonic Gaussian regularization of attention for robust automatic speech recognition
    Du, Yeqian
    Wu, Minghui
    Fang, Xin
    Yang, Zhouwang
    COMPUTER SPEECH AND LANGUAGE, 2023, 77
  • [9] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [10] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81