Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

被引：0

作者：

Han, Yuchen ^{[1
]}

Xu, Chen ^{[1
]}

Xiao, Tong ^{[1
,2
]}

Zhu, Jingbo ^{[1
,2
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

[2] NiuTrans Res, Shenyang, Peoples R China

来源：

61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the overfitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for enfr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB.

引用

页码：1340 / 1348

页数：9

共 50 条

[31] SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
Tsiamas, Ioannis
Gallego, Gerard I.
Fonollosa, Jose A. R.
Costa-jussa, Marta R.
INTERSPEECH 2022, 2022, : 106 - 110
[32] PromptST: Abstract Prompt Learning for End-to-End Speech Translation
Yu, Tengfei
Ding, Liang
Liu, Xuebo
Chen, Kehai
Zhang, Meishan
Tao, Dacheng
Zhang, Min
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10140 - 10154
[33] Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation
Salesky, Elizabeth
Sperber, Matthias
Black, Alan W.
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1835 - 1841
[34] Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data
Zhang, Yuhao
Xu, Chen
Hu, Bojie
Zhang, Chunliang
Xiao, Tong
Zhu, Jingbo
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13984 - 13992
[35] Empirical Study of End-to-End Speech-to-Braille Translation of Japanese for Deafblind Persons
Kobayashi, Akio
Tt, Junji Onishi
ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS, 2025, 13 (01): : 179 - 186
[36] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
Fukuda, Ryo
Sudoh, Katsuhito
Nakamura, Satoshi
INTERSPEECH 2022, 2022, : 121 - 125
[37] Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition
Lakomkin, Egor
Heymann, Jahn
Sklyar, Ilya
Wiesler, Simon
INTERSPEECH 2020, 2020, : 3600 - 3604
[38] Cross-modality Data Augmentation for End-to-End Sign Language Translation
Ye, Jinhui
Jiao, Wenxiang
Wang, Xing
Tu, Zhaopeng
Xiong, Hui
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13558 - 13571
[39] SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Drexler, Jennifer
Glass, James
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6266 - 6270
[40] End-to-end Speech Translation by Integrating Cross-modal Information
Liu Y.-C.
Zong C.-Q.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (04): : 1837 - 1849

← 1 2 3 4 5 →