LEVERAGING WEAKLY SUPERVISED DATA TO IMPROVE END-TO-END SPEECH-TO-TEXT TRANSLATION

被引:0
|
作者
Jia, Ye [1 ]
Johnson, Melvin [1 ]
Macherey, Wolfgang [1 ]
Weiss, Ron J. [1 ]
Cao, Yuan [1 ]
Chiu, Chung-Cheng [1 ]
Ari, Naveen [1 ]
Laurenzo, Stella [1 ]
Wu, Yonghui [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
关键词
Speech translation; sequence-to-sequence model; weakly supervised learning; synthetic training data;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding. However, the quality of end-to-end ST is often limited by a paucity of training data, since it is difficult to collect large parallel corpora of speech and translated transcript pairs. Previous studies have proposed the use of pre-trained components and multi-task learning in order to benefit from weakly supervised training data, such as speech-to-transcript or text-to-foreign-text pairs. In this paper, we demonstrate that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speech-to-translation pairs for ST training can be more effective than multi-task learning. Furthermore, we demonstrate that a high quality end-to-end ST model can be trained using only weakly supervised datasets, and that synthetic data sourced from unlabeled monolingual text or speech can be used to improve performance. Finally, we discuss methods for avoiding overfitting to synthetic speech with a quantitative ablation study.
引用
收藏
页码:7180 / 7184
页数:5
相关论文
共 50 条
  • [1] End-to-End Speech-to-Text Translation: A Survey
    Sethiya, Nivedita
    Maurya, Chandresh Kumar
    [J]. Computer Speech and Language, 2025, 90
  • [2] Revisiting End-to-End Speech-to-Text Translation From Scratch
    Zhang, Biao
    Haddow, Barry
    Sennrich, Rico
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
    Le, Chenyang
    Qian, Yao
    Zhou, Long
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    Huang, Xuedong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
    Zhao, Jinming
    Yang, Hao
    Shareghi, Ehsan
    Haffari, Gholamreza
    [J]. INTERSPEECH 2022, 2022, : 111 - 115
  • [5] TOWARDS END-TO-END SPEECH-TO-TEXT TRANSLATION WITH TWO-PASS DECODING
    Sung, Tzu-Wei
    Liu, Jun-You
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7175 - 7179
  • [6] End-to-end Jordanian dialect speech-to-text self-supervised learning framework
    Safieh, Ali A.
    Abu Alhaol, Ibrahim
    Ghnemat, Rawan
    [J]. FRONTIERS IN ROBOTICS AND AI, 2022, 9
  • [7] Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation
    Dong, Qianqian
    Ye, Rong
    Wang, Mingxuan
    Zhou, Hao
    Xu, Shuang
    Xu, Bo
    Li, Lei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12749 - 12759
  • [8] AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
    Huang, Wuwei
    Wang, Dexin
    Xiong, Deyi
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2539 - 2545
  • [9] Self-Supervised Representations Improve End-to-End Speech Translation
    Wu, Anne
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    [J]. INTERSPEECH 2020, 2020, : 1491 - 1495
  • [10] SpecRec: An Alternative Solution for Improving End-to-End Speech-to-Text Translation via Spectrogram Reconstruction
    Chen, Junkun
    Ma, Mingbo
    Zheng, Renjie
    Huang, Liang
    [J]. INTERSPEECH 2021, 2021, : 2232 - 2236