Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

被引：0

作者：

Han, Yuchen ^{[1
]}

Xu, Chen ^{[1
]}

Xiao, Tong ^{[1
,2
]}

Zhu, Jingbo ^{[1
,2
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

[2] NiuTrans Res, Shenyang, Peoples R China

来源：

61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the overfitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for enfr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB.

引用

页码：1340 / 1348

页数：9

共 50 条

[41] Self-Supervised Representations Improve End-to-End Speech Translation
Wu, Anne
Wang, Changhan
Pino, Juan
Gu, Jiatao
INTERSPEECH 2020, 2020, : 1491 - 1495
[42] SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
Ma, Xutai
Pino, Juan
Koehn, Philipp
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 582 - 587
[43] END-TO-END SPEECH TRANSLATION WITH SELF-CONTAINED VOCABULARY MANIPULATION
Tu, Mei
Zhang, Fan
Liu, Wei
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7929 - 7933
[44] Revisiting End-to-End Speech-to-Text Translation From Scratch
Zhang, Biao
Haddow, Barry
Sennrich, Rico
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[45] MuST-C: A multilingual corpus for end-to-end speech translation
Cattoni, Roldano
Di Gangi, Mattia Antonino
Bentivogli, Luisa
Negri, Matteo
Turchi, Marco
COMPUTER SPEECH AND LANGUAGE, 2021, 66
[46] Neural End-To-End Speech Translation Leveraged by ASR Posterior Distribution
Ko, Yuka
Sudoh, Katsuhito
Sakti, Sakriani
Nakamura, Satoshi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (10) : 1322 - 1331
[47] Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
Nguyen, Ha
Esteve, Yannick
Besacier, Laurent
INTERSPEECH 2021, 2021, : 2371 - 2375
[48] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
Inaguma, Hirofumi
Kawahara, Tatsuya
Watanabe, Shinji
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1872 - 1881
[49] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
Liu, Da-Rong
Yang, Chi-Yu
Wu, Szu-Lin
Lee, Hung-Yi
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
[50] CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation
Lei, Yikun
Xue, Zhengshan
Sun, Haoran
Zhao, Xiaohu
Zhu, Shaolin
Lin, Xiaodong
Xiong, Deyi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3123 - 3137

← 1 2 3 4 5 →