Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

被引：0

作者：

Zhao, Chendong ^{[1
,2
]}

Wang, Jianzong ^{[1
]}

Wei, Wenqi ^{[1
]}

Qu, Xiaoyang ^{[1
]}

Wang, Haoqian ^{[2
]}

Xiao, Jing ^{[1
]}

机构：

[1] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China

[2] Tsinghua Univ, Shenzhen Int Grad Sch, Beijing, Peoples R China

来源：

2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA) | 2022年

关键词：

Automatic Speech Recognition; Sparse Attention; Monotonic Attention; Self-Attention;

D O I：

10.1109/DSAA54385.2022.10032360

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Transformer architecture model, based on selfattention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR). However, self-attention and multi-head attention cannot be easily applied for streaming or online ASR. For self-attention in Transformer ASR, the softmax normalization function-based attention mechanism makes it impossible to highlight important speech information. For multi-head attention in Transformer ASR, it is not easy to model monotonic alignments in different heads. To overcome these two limits, we integrate sparse attention and monotonic attention into Transformer-based ASR. The sparse mechanism introduces a learned sparsity scheme to enable each self-attention structure to fit the corresponding head better. The monotonic attention deploys regularization to prune redundant heads for the multi-head attention structure. The experiments show that our method can effectively improve the attention mechanism on widely used benchmarks of speech recognition.

引用

页码：173 / 180

页数：8

共 50 条

[1] Transformer-Based Turkish Automatic Speech Recognition
Tasar, Davut Emre
Koruyan, Kutan
Cilgin, Cihan
ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
[2] A window attention based Transformer for Automatic Speech Recognition
Feng, Zhao
Li, Yongming
2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 449 - 454
[3] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
Zeyer, Albert
Schmitt, Robin
Zhou, Wei
Schlueter, Ralf
Ney, Hermann
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
[4] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
Dong, Fang
Qian, Yiyang
Wang, Tianlei
Liu, Peng
Cao, Jiuwen
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
[5] A transformer-based network for speech recognition
Tang L.
International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
[6] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
[7] Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Lehecka, Jan
Psutka, Josef, V
Psutka, Josef
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 301 - 312
[8] Monotonic Gaussian regularization of attention for robust automatic speech recognition
Du, Yeqian
Wu, Minghui
Fang, Xin
Yang, Zhouwang
COMPUTER SPEECH AND LANGUAGE, 2023, 77
[9] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
Xu, Menglong
Li, Shengqiang
Zhang, Xiao-Lei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
[10] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
Luo, Haoneng
Zhang, Shiliang
Lei, Ming
Xie, Lei
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81

← 1 2 3 4 5 →