Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

被引：2

作者：

Li, Xueqing ^{[1
]}

Li, Shengqiang ^{[1
]}

Zhang, Xiao-Lei ^{[1
,2
]}

Rahardja, Susanto ^{[1
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China

[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 710072, Peoples R China

[3] Singapore Inst Technol, Engn Cluster, Singapore 138683, Singapore

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

美国国家科学基金会;

关键词：

End-to-end speech translation; rotary position embedding; Transformer;

D O I：

10.1109/LSP.2024.3353039

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, many Transformer-based models have been applied to end-to-end speech translation because of their capability to model global dependencies. Position embedding is crucial in Transformer models as it facilitates the modeling of dependencies between elements at various positions within the input sequence. Most position embedding methods employed in speech translation such as the absolute and relative position embedding, often encounter challenges in leveraging relative positional information or adding computational burden to the model. In this letter, we introduce a novel approach by incorporating rotary position embedding into Transformer-based speech translation (RoPE-ST). RoPE-ST first adds absolute position information by multiplying the input vector with rotation matrices, and then implements relative position embedding through the dot-product of the self-attention mechanism. The main advantage of the proposed method over the original method is that rotary position embedding combines the benefits of absolute and relative position embedding, which is suited for position embedding in speech translation tasks. We conduct experiments on a multilingual speech translation corpus MuST-C. Results show that RoPE-ST achieves an average improvement of 2.91 BLEU over the method without rotary position embedding in eight translation directions.

引用

页码：371 / 375

页数：5

共 50 条

[31] TMSS: An End-to-End Transformer-Based Multimodal Network for Segmentation and Survival Prediction
Saeed, Numan
Sobirov, Ikboljon
Al Majzoub, Roba
Yaqub, Mohammad
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 319 - 329
[32] Transformer-Based End-to-End Classification of Variable-Length Volumetric Data
Oghbaie, Marzieh
Araujo, Teresa
Emre, Taha
Schmidt-Erfurth, Ursula
Bogunovic, Hrvoje
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 358 - 367
[33] TransOrga: End-To-End Multi-modal Transformer-Based Organoid Segmentation
Qin, Yiming
Li, Jiajia
Chen, Yulong
Wang, Zikai
Huang, Yu-An
You, Zhuhong
Hu, Lun
Hu, Pengwei
Tan, Feng
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT III, 2023, 14088 : 460 - 472
[34] TOD-Net: An end-to-end transformer-based object detection network
Sirisha, Museboyina
Sudha, S. V.
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
[35] End-to-End Speech Translation with Adversarial Training
Li, Xuancai
Chen, Kehai
Zhao, Tiejun
Yang, Muyun
WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
[36] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
Berard, Alexandre
Besacier, Laurent
Kocabiyikoglu, Ali Can
Pietquin, Olivier
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228
[37] End-to-End Speech Translation with Knowledge Distillation
Liu, Yuchen
Xiong, Hao
Zhang, Jiajun
He, Zhongjun
Wu, Hua
Wang, Haifeng
Zong, Chengqing
INTERSPEECH 2019, 2019, : 1128 - 1132
[38] Adapting Transformer to End-to-end Spoken Language Translation
Di Gangi, Mattia A.
Negri, Matteo
Turchi, Marco
INTERSPEECH 2019, 2019, : 1133 - 1137
[39] Online Compressive Transformer for End-to-End Speech Recognition
Leong, Chi-Hang
Huang, Yu-Han
Chien, Jen-Tzung
INTERSPEECH 2021, 2021, : 2082 - 2086
[40] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
Maekaku, Takashi
Fujita, Yuya
Peng, Yifan
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 1071 - 1075

← 1 2 3 4 5 →