An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition

被引：2

作者：

Yue, Fengpeng ^{[1
]}

Ko, Tom ^{[1
]}

机构：

[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China

来源：

2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2021年

关键词：

end-to-end; transformer; positional embedding; self-attention;

D O I：

10.1109/ISCSLP49672.2021.9362093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the Transformer architecture, the model does not intrinsically learn the ordering information of the input frames and tokens due to its self-attention mechanism. In sequence-to-sequence learning tasks, the missing of ordering information is explicitly filled up by the use of positional representation. Currently, there are two major ways of using positional representation: the absolute way and relative way. In both ways, the positional information is represented by positional vector. In this paper, we propose the use of positional matrix in the context of relative positional vector. Instead of adding the vectors to the key vectors in the self-attention layer, our method transforms the key vectors according to its position. Experiments on LibriSpeech dataset show that our approach outperforms the positional vector approach.

引用

页数：5

共 50 条

[1] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
Dong, Fang
Qian, Yiyang
Wang, Tianlei
Liu, Peng
Cao, Jiuwen
[J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
[2] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
[3] Transformer-based Long-context End-to-end Speech Recognition
Hori, Takaaki
Moritz, Niko
Hori, Chiori
Le Roux, Jonathan
[J]. INTERSPEECH 2020, 2020, : 5011 - 5015
[4] On-device Streaming Transformer-based End-to-End Speech Recognition
Oh, Yoo Rhee
Park, Kiyoung
[J]. INTERSPEECH 2021, 2021, : 967 - 968
[5] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
Xu, Menglong
Li, Shengqiang
Zhang, Xiao-Lei
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
[6] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
Luo, Haoneng
Zhang, Shiliang
Lei, Ming
Xie, Lei
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
[7] A study of transformer-based end-to-end speech recognition system for Kazakh language
Mamyrbayev Orken
Oralbekova Dina
Alimhan Keylan
Turdalykyzy Tolganay
Othman Mohamed
[J]. Scientific Reports, 12
[8] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
Miao, Haoran
Cheng, Gaofeng
Gao, Changfeng
Zhang, Pengyuan
Yan, Yonghong
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088
[9] A study of transformer-based end-to-end speech recognition system for Kazakh language
Mamyrbayev, Orken
Oralbekova, Dina
Alimhan, Keylan
Turdalykyzy, Tolganay
Othman, Mohamed
[J]. SCIENTIFIC REPORTS, 2022, 12 (01)
[10] Transformer-based end-to-end scene text recognition
Zhu, Xinghao
Zhang, Zhi
[J]. PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695

← 1 2 3 4 5 →