Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

被引:2
|
作者
Li, Xueqing [1 ]
Li, Shengqiang [1 ]
Zhang, Xiao-Lei [1 ,2 ]
Rahardja, Susanto [1 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 710072, Peoples R China
[3] Singapore Inst Technol, Engn Cluster, Singapore 138683, Singapore
基金
美国国家科学基金会;
关键词
End-to-end speech translation; rotary position embedding; Transformer;
D O I
10.1109/LSP.2024.3353039
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, many Transformer-based models have been applied to end-to-end speech translation because of their capability to model global dependencies. Position embedding is crucial in Transformer models as it facilitates the modeling of dependencies between elements at various positions within the input sequence. Most position embedding methods employed in speech translation such as the absolute and relative position embedding, often encounter challenges in leveraging relative positional information or adding computational burden to the model. In this letter, we introduce a novel approach by incorporating rotary position embedding into Transformer-based speech translation (RoPE-ST). RoPE-ST first adds absolute position information by multiplying the input vector with rotation matrices, and then implements relative position embedding through the dot-product of the self-attention mechanism. The main advantage of the proposed method over the original method is that rotary position embedding combines the benefits of absolute and relative position embedding, which is suited for position embedding in speech translation tasks. We conduct experiments on a multilingual speech translation corpus MuST-C. Results show that RoPE-ST achieves an average improvement of 2.91 BLEU over the method without rotary position embedding in eight translation directions.
引用
收藏
页码:371 / 375
页数:5
相关论文
共 50 条
  • [31] TMSS: An End-to-End Transformer-Based Multimodal Network for Segmentation and Survival Prediction
    Saeed, Numan
    Sobirov, Ikboljon
    Al Majzoub, Roba
    Yaqub, Mohammad
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 319 - 329
  • [32] Transformer-Based End-to-End Classification of Variable-Length Volumetric Data
    Oghbaie, Marzieh
    Araujo, Teresa
    Emre, Taha
    Schmidt-Erfurth, Ursula
    Bogunovic, Hrvoje
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 358 - 367
  • [33] TransOrga: End-To-End Multi-modal Transformer-Based Organoid Segmentation
    Qin, Yiming
    Li, Jiajia
    Chen, Yulong
    Wang, Zikai
    Huang, Yu-An
    You, Zhuhong
    Hu, Lun
    Hu, Pengwei
    Tan, Feng
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT III, 2023, 14088 : 460 - 472
  • [34] TOD-Net: An end-to-end transformer-based object detection network
    Sirisha, Museboyina
    Sudha, S. V.
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [35] End-to-End Speech Translation with Adversarial Training
    Li, Xuancai
    Chen, Kehai
    Zhao, Tiejun
    Yang, Muyun
    WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
  • [36] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
    Berard, Alexandre
    Besacier, Laurent
    Kocabiyikoglu, Ali Can
    Pietquin, Olivier
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228
  • [37] End-to-End Speech Translation with Knowledge Distillation
    Liu, Yuchen
    Xiong, Hao
    Zhang, Jiajun
    He, Zhongjun
    Wu, Hua
    Wang, Haifeng
    Zong, Chengqing
    INTERSPEECH 2019, 2019, : 1128 - 1132
  • [38] Adapting Transformer to End-to-end Spoken Language Translation
    Di Gangi, Mattia A.
    Negri, Matteo
    Turchi, Marco
    INTERSPEECH 2019, 2019, : 1133 - 1137
  • [39] Online Compressive Transformer for End-to-End Speech Recognition
    Leong, Chi-Hang
    Huang, Yu-Han
    Chien, Jen-Tzung
    INTERSPEECH 2021, 2021, : 2082 - 2086
  • [40] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
    Maekaku, Takashi
    Fujita, Yuya
    Peng, Yifan
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 1071 - 1075