Curriculum Pre-training for End-to-End Speech Translation

被引:0
|
作者
Wang, Chengyi [1 ]
Wu, Yu [2 ]
Liu, Shujie [2 ]
Zhou, Ming [2 ]
Yang, Zhenglu [1 ]
机构
[1] Nankai Univ, Tianjin, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end speech translation poses a heavy burden on the encoder because it has to transcribe, understand, and learn cross-lingual semantics simultaneously. To obtain a powerful encoder, traditional methods pre-train it on ASR data to capture speech features. However, we argue that pre-training the encoder only through simple speech recognition is not enough, and high-level linguistic knowledge should be considered. Inspired by this, we propose a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages. The difficulty of these courses is gradually increasing. Experiments show that our curriculum pre-training method leads to significant improvements on En-De and En-Fr speech translation benchmarks.
引用
收藏
页码:3728 / 3738
页数:11
相关论文
共 50 条
  • [1] Investigating Self-supervised Pre-training for End-to-end Speech Translation
    Ha Nguyen
    Bougares, Fethi
    Tomashenko, Natalia
    Esteve, Yannick
    Besacier, Laurent
    [J]. INTERSPEECH 2020, 2020, : 1466 - 1470
  • [2] Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Yang, Zhenglu
    Zhou, Ming
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9161 - 9168
  • [3] Speech Model Pre-training for End-to-End Spoken Language Understanding
    Lugosch, Loren
    Ravanelli, Mirco
    Ignoto, Patrick
    Tomar, Vikrant Singh
    Bengio, Yoshua
    [J]. INTERSPEECH 2019, 2019, : 814 - 818
  • [4] SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Qian, Yao
    Bianv, Ximo
    Shi, Yu
    Kanda, Naoyuki
    Shen, Leo
    Xiao, Zhen
    Zeng, Michael
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7458 - 7462
  • [5] End-to-End Speech Translation with Adversarial Training
    Li, Xuancai
    Chen, Kehai
    Zhao, Tiejun
    Yang, Muyun
    [J]. WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
  • [6] Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
    Ao, Junyi
    Zhang, Ziqiang
    Zhou, Long
    Liu, Shujie
    Li, Haizhou
    Ko, Tom
    Dai, Lirong
    Li, Jinyu
    Qian, Yao
    Wei, Furu
    [J]. INTERSPEECH 2022, 2022, : 2658 - 2662
  • [7] EXPLORING PRE-TRAINING WITH ALIGNMENTS FOR RNN TRANSDUCER BASED END-TO-END SPEECH RECOGNITION
    Hu, Hu
    Zhao, Rui
    Li, Jinyu
    Lu, Liang
    Gong, Yifan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7079 - 7083
  • [8] Self-Training for End-to-End Speech Translation
    Pino, Juan
    Xu, Qiantong
    Ma, Xutai
    Dousti, Mohammad Javad
    Tang, Yun
    [J]. INTERSPEECH 2020, 2020, : 1476 - 1480
  • [9] A comparison of supervised and unsupervised pre-training of end-to-end models
    Misra, Ananya
    Hwang, Dongseong
    Huo, Zhouyuan
    Garg, Shefali
    Siddhartha, Nikhil
    Narayanan, Arun
    Sim, Khe Chai
    [J]. INTERSPEECH 2021, 2021, : 731 - 735
  • [10] AIPNET: GENERATIVE ADVERSARIAL PRE-TRAINING OF ACCENT-INVARIANT NETWORKS FOR END-TO-END SPEECH RECOGNITION
    Chen, Yi-Chen
    Yang, Zhaojun
    Yeh, Ching-Feng
    Jain, Mahaveer
    Seltzer, Michael L.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6979 - 6983