Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

被引：2

作者：

Li, Xueqing ^{[1
]}

Li, Shengqiang ^{[1
]}

Zhang, Xiao-Lei ^{[1
,2
]}

Rahardja, Susanto ^{[1
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China

[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 710072, Peoples R China

[3] Singapore Inst Technol, Engn Cluster, Singapore 138683, Singapore

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

美国国家科学基金会;

关键词：

End-to-end speech translation; rotary position embedding; Transformer;

D O I：

10.1109/LSP.2024.3353039

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, many Transformer-based models have been applied to end-to-end speech translation because of their capability to model global dependencies. Position embedding is crucial in Transformer models as it facilitates the modeling of dependencies between elements at various positions within the input sequence. Most position embedding methods employed in speech translation such as the absolute and relative position embedding, often encounter challenges in leveraging relative positional information or adding computational burden to the model. In this letter, we introduce a novel approach by incorporating rotary position embedding into Transformer-based speech translation (RoPE-ST). RoPE-ST first adds absolute position information by multiplying the input vector with rotation matrices, and then implements relative position embedding through the dot-product of the self-attention mechanism. The main advantage of the proposed method over the original method is that rotary position embedding combines the benefits of absolute and relative position embedding, which is suited for position embedding in speech translation tasks. We conduct experiments on a multilingual speech translation corpus MuST-C. Results show that RoPE-ST achieves an average improvement of 2.91 BLEU over the method without rotary position embedding in eight translation directions.

引用

页码：371 / 375

页数：5

共 50 条

[41] OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images
Zhao, Jiaqi
Ding, Zeyu
Zhou, Yong
Zhu, Hancheng
Du, Wen-Liang
Yao, Rui
El Saddik, Abdulmotaleb
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[42] HyperSFormer: A Transformer-Based End-to-End Hyperspectral Image Classification Method for Crop Classification
Xie, Jiaxing
Hua, Jiajun
Chen, Shaonan
Wu, Peiwen
Gao, Peng
Sun, Daozong
Lyu, Zhendong
Lyu, Shilei
Xue, Xiuyun
Lu, Jianqiang
REMOTE SENSING, 2023, 15 (14)
[43] Intra-hour solar irradiance forecasting: An end-to-end Transformer-based network
Song, Kang
Wang, Kai
Wang, Shibo
Wang, Nan
Zhang, Jingxin
Zhang, Kanjian
Wei, Haikun
39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 526 - 531
[44] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
Maekaku, Takashi
Fujita, Yuya
Peng, Yifan
Watanabe, Shinji
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 1071 - 1075
[45] HybridTE2: Hybrid Transformer-based End-to-End Learning for Autonomous Driving
Rayakota, Haritha Prasad
Huang, Pei-Chi
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS 2024, 2024,
[46] Ghost translation: an end-to-end ghost imaging approach based on the transformer network
Ren, Wenhan
Nie, Xiaoyu
Peng, Tao
Scully, Marlan O.
OPTICS EXPRESS, 2022, 30 (26): : 47921 - 47932
[47] MINTZAI: End-to-end Deep Learning for Speech Translation
Etchegoyhen, Thierry
Arzelus, Haritz
Gete, Harritxu
Alvarez, Aitor
Hernaez, Inma
Navas, Eva
Gonzalez-Docasal, Ander
Osacar, Jaime
Benites, Edson
Ellakuria, Igor
Calonge, Eusebi
Martin, Maite
PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
[48] Adaptive Feature Selection for End-to-End Speech Translation
Zhang, Biao
Titov, Ivan
Haddow, Barry
Sennrich, Rico
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2533 - 2544
[49] Speaker voice normalization for end-to-end speech translation
Xue, Zhengshan
Shi, Tingxun
Zhang, Xiaolei
Xiong, Deyi
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
[50] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
Ren, Yi
Liu, Jinglin
Tan, Xu
Zhang, Chen
Qin, Tao
Zhao, Zhou
Liu, Tie-Yan
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796

← 1 2 3 4 5 →