Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

被引：0

作者：

Han, Yuchen ^{[1
]}

Xu, Chen ^{[1
]}

Xiao, Tong ^{[1
,2
]}

Zhu, Jingbo ^{[1
,2
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

[2] NiuTrans Res, Shenyang, Peoples R China

来源：

61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the overfitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for enfr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB.

引用

页码：1340 / 1348

页数：9

共 50 条

[1] Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Zhang, Biao
Haddow, Barry
Sennrich, Rico
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2264 - 2276
[2] A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION
Bahar, Parnia
Bieschke, Tobias
Ney, Hermann
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 792 - 799
[3] MULTILINGUAL END-TO-END SPEECH TRANSLATION
Inaguma, Hirofumi
Duh, Kevin
Kawahara, Tatsuya
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
[4] M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
Zhao, Jinming
Yang, Hao
Shareghi, Ehsan
Haffari, Gholamreza
INTERSPEECH 2022, 2022, : 111 - 115
[5] End-to-End Speech Translation for Code Switched Speech
Weller, Orion
Sperber, Matthias
Pires, Telmo
Setiawan, Hendra
Gollan, Christian
Telaar, Dominic
Paulik, Matthias
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
[6] End-to-End Speech Translation with Adversarial Training
Li, Xuancai
Chen, Kehai
Zhao, Tiejun
Yang, Muyun
WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
[7] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
Berard, Alexandre
Besacier, Laurent
Kocabiyikoglu, Ali Can
Pietquin, Olivier
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228
[8] End-to-End Speech Translation with Knowledge Distillation
Liu, Yuchen
Xiong, Hao
Zhang, Jiajun
He, Zhongjun
Wu, Hua
Wang, Haifeng
Zong, Chengqing
INTERSPEECH 2019, 2019, : 1128 - 1132
[9] AN EMPIRICAL STUDY OF END-TO-END SIMULTANEOUS SPEECH TRANSLATION DECODING STRATEGIES
Ha Nguyen
Esteve, Yannick
Besacier, Laurent
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7528 - 7532
[10] MINTZAI: End-to-end Deep Learning for Speech Translation
Etchegoyhen, Thierry
Arzelus, Haritz
Gete, Harritxu
Alvarez, Aitor
Hernaez, Inma
Navas, Eva
Gonzalez-Docasal, Ander
Osacar, Jaime
Benites, Edson
Ellakuria, Igor
Calonge, Eusebi
Martin, Maite
PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100

← 1 2 3 4 5 →