Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

被引:0
|
作者
Han, Yuchen [1 ]
Xu, Chen [1 ]
Xiao, Tong [1 ,2 ]
Zhu, Jingbo [1 ,2 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
[2] NiuTrans Res, Shenyang, Peoples R China
来源
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the overfitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for enfr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB.
引用
收藏
页码:1340 / 1348
页数:9
相关论文
共 50 条
  • [41] Self-Supervised Representations Improve End-to-End Speech Translation
    Wu, Anne
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 1491 - 1495
  • [42] SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
    Ma, Xutai
    Pino, Juan
    Koehn, Philipp
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 582 - 587
  • [43] END-TO-END SPEECH TRANSLATION WITH SELF-CONTAINED VOCABULARY MANIPULATION
    Tu, Mei
    Zhang, Fan
    Liu, Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7929 - 7933
  • [44] Revisiting End-to-End Speech-to-Text Translation From Scratch
    Zhang, Biao
    Haddow, Barry
    Sennrich, Rico
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [45] MuST-C: A multilingual corpus for end-to-end speech translation
    Cattoni, Roldano
    Di Gangi, Mattia Antonino
    Bentivogli, Luisa
    Negri, Matteo
    Turchi, Marco
    COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [46] Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution
    Ko, Yuka
    Sudoh, Katsuhito
    Sakti, Sakriani
    Nakamura, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (10) : 1322 - 1331
  • [47] Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
    Nguyen, Ha
    Esteve, Yannick
    Besacier, Laurent
    INTERSPEECH 2021, 2021, : 2371 - 2375
  • [48] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
    Inaguma, Hirofumi
    Kawahara, Tatsuya
    Watanabe, Shinji
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1872 - 1881
  • [49] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [50] CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation
    Lei, Yikun
    Xue, Zhengshan
    Sun, Haoran
    Zhao, Xiaohu
    Zhu, Shaolin
    Lin, Xiaodong
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3123 - 3137