Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

被引:2
|
作者
Li, Xueqing [1 ]
Li, Shengqiang [1 ]
Zhang, Xiao-Lei [1 ,2 ]
Rahardja, Susanto [1 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 710072, Peoples R China
[3] Singapore Inst Technol, Engn Cluster, Singapore 138683, Singapore
基金
美国国家科学基金会;
关键词
End-to-end speech translation; rotary position embedding; Transformer;
D O I
10.1109/LSP.2024.3353039
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, many Transformer-based models have been applied to end-to-end speech translation because of their capability to model global dependencies. Position embedding is crucial in Transformer models as it facilitates the modeling of dependencies between elements at various positions within the input sequence. Most position embedding methods employed in speech translation such as the absolute and relative position embedding, often encounter challenges in leveraging relative positional information or adding computational burden to the model. In this letter, we introduce a novel approach by incorporating rotary position embedding into Transformer-based speech translation (RoPE-ST). RoPE-ST first adds absolute position information by multiplying the input vector with rotation matrices, and then implements relative position embedding through the dot-product of the self-attention mechanism. The main advantage of the proposed method over the original method is that rotary position embedding combines the benefits of absolute and relative position embedding, which is suited for position embedding in speech translation tasks. We conduct experiments on a multilingual speech translation corpus MuST-C. Results show that RoPE-ST achieves an average improvement of 2.91 BLEU over the method without rotary position embedding in eight translation directions.
引用
收藏
页码:371 / 375
页数:5
相关论文
共 50 条
  • [41] OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images
    Zhao, Jiaqi
    Ding, Zeyu
    Zhou, Yong
    Zhu, Hancheng
    Du, Wen-Liang
    Yao, Rui
    El Saddik, Abdulmotaleb
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [42] HyperSFormer: A Transformer-Based End-to-End Hyperspectral Image Classification Method for Crop Classification
    Xie, Jiaxing
    Hua, Jiajun
    Chen, Shaonan
    Wu, Peiwen
    Gao, Peng
    Sun, Daozong
    Lyu, Zhendong
    Lyu, Shilei
    Xue, Xiuyun
    Lu, Jianqiang
    REMOTE SENSING, 2023, 15 (14)
  • [43] Intra-hour solar irradiance forecasting: An end-to-end Transformer-based network
    Song, Kang
    Wang, Kai
    Wang, Shibo
    Wang, Nan
    Zhang, Jingxin
    Zhang, Kanjian
    Wei, Haikun
    39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 526 - 531
  • [44] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
    Maekaku, Takashi
    Fujita, Yuya
    Peng, Yifan
    Watanabe, Shinji
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 1071 - 1075
  • [45] HybridTE2: Hybrid Transformer-based End-to-End Learning for Autonomous Driving
    Rayakota, Haritha Prasad
    Huang, Pei-Chi
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS 2024, 2024,
  • [46] Ghost translation: an end-to-end ghost imaging approach based on the transformer network
    Ren, Wenhan
    Nie, Xiaoyu
    Peng, Tao
    Scully, Marlan O.
    OPTICS EXPRESS, 2022, 30 (26): : 47921 - 47932
  • [47] MINTZAI: End-to-end Deep Learning for Speech Translation
    Etchegoyhen, Thierry
    Arzelus, Haritz
    Gete, Harritxu
    Alvarez, Aitor
    Hernaez, Inma
    Navas, Eva
    Gonzalez-Docasal, Ander
    Osacar, Jaime
    Benites, Edson
    Ellakuria, Igor
    Calonge, Eusebi
    Martin, Maite
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
  • [48] Adaptive Feature Selection for End-to-End Speech Translation
    Zhang, Biao
    Titov, Ivan
    Haddow, Barry
    Sennrich, Rico
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2533 - 2544
  • [49] Speaker voice normalization for end-to-end speech translation
    Xue, Zhengshan
    Shi, Tingxun
    Zhang, Xiaolei
    Xiong, Deyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [50] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhang, Chen
    Qin, Tao
    Zhao, Zhou
    Liu, Tie-Yan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796