M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation

被引:1
|
作者
Zhao, Jinming [1 ]
Yang, Hao [1 ]
Shareghi, Ehsan [1 ]
Haffari, Gholamreza [1 ]
机构
[1] Monash Univ, Dept Data Sci & AI, Clayton, Vic, Australia
来源
关键词
speech translation; modality adaptation;
D O I
10.21437/Interspeech.2022-592
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end speech-to-text translation models are often initialized with pre-trained speech encoder and pre-trained text decoder. This leads to a significant training gap between pre-training and fine-tuning, largely due to the modality differences between speech outputs from the encoder and text inputs to the decoder. In this work, we aim to bridge the modality gap between speech and text to improve translation quality. We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text. While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation via modelling global and local dependencies of a speech sequence. Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU score on the Must-C En -> DE dataset.(1)
引用
收藏
页码:111 / 115
页数:5
相关论文
共 50 条
  • [1] End-to-End Speech-to-Text Translation: A Survey
    Sethiya, Nivedita
    Maurya, Chandresh Kumar
    [J]. Computer Speech and Language, 2025, 90
  • [2] Revisiting End-to-End Speech-to-Text Translation From Scratch
    Zhang, Biao
    Haddow, Barry
    Sennrich, Rico
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
    Le, Chenyang
    Qian, Yao
    Zhou, Long
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    Huang, Xuedong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] LEVERAGING WEAKLY SUPERVISED DATA TO IMPROVE END-TO-END SPEECH-TO-TEXT TRANSLATION
    Jia, Ye
    Johnson, Melvin
    Macherey, Wolfgang
    Weiss, Ron J.
    Cao, Yuan
    Chiu, Chung-Cheng
    Ari, Naveen
    Laurenzo, Stella
    Wu, Yonghui
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7180 - 7184
  • [5] TOWARDS END-TO-END SPEECH-TO-TEXT TRANSLATION WITH TWO-PASS DECODING
    Sung, Tzu-Wei
    Liu, Jun-You
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7175 - 7179
  • [6] END-END SPEECH-TO-TEXT TRANSLATION WITH MODALITY AGNOSTIC META-LEARNING
    Indurthi, Sathish
    Han, Houjeung
    Lakumarapu, Nikhil Kumar
    Lee, Beomseok
    Chung, Insoo
    Kim, Sangha
    Kim, Chanwoo
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7904 - 7908
  • [7] AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
    Huang, Wuwei
    Wang, Dexin
    Xiong, Deyi
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2539 - 2545
  • [8] Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation
    Dong, Qianqian
    Ye, Rong
    Wang, Mingxuan
    Zhou, Hao
    Xu, Shuang
    Xu, Bo
    Li, Lei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12749 - 12759
  • [9] SpecRec: An Alternative Solution for Improving End-to-End Speech-to-Text Translation via Spectrogram Reconstruction
    Chen, Junkun
    Ma, Mingbo
    Zheng, Renjie
    Huang, Liang
    [J]. INTERSPEECH 2021, 2021, : 2232 - 2236
  • [10] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhang, Chen
    Qin, Tao
    Zhao, Zhou
    Liu, Tie-Yan
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796