Investigating Self-supervised Pre-training for End-to-end Speech Translation

被引:15
|
作者
Ha Nguyen [1 ,2 ]
Bougares, Fethi [3 ]
Tomashenko, Natalia [2 ]
Esteve, Yannick [2 ]
Besacier, Laurent [1 ]
机构
[1] Univ Grenoble Alpes, LIG, Grenoble, France
[2] Avignon Univ, LIA, Avignon, France
[3] Le Mans Univ, LIUM, Le Mans, France
来源
关键词
self-supervised learning from speech; automatic speech translation; end-to-end models; low resource settings;
D O I
10.21437/Interspeech.2020-1835
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Self-supervised learning from raw speech has been proven beneficial to improve automatic speech recognition (ASR). We investigate here its impact on end-to-end automatic speech translation (AST) performance. We use a contrastive predictive coding (CPC) model pre-trained from unlabeled speech as a feature extractor for a downstream AST task. We show that self-supervised pre-training is particularly efficient in low resource settings and that fine-tuning CPC models on the AST training data further improves performance. Even in higher resource settings, ensembling AST models trained with filter-bank and CPC representations leads to near state-of-the-art models without using any ASR pre-training. This might be particularly beneficial when one needs to develop a system that translates from speech in a language with poorly standardized orthography or even from speech in an unwritten language.
引用
收藏
页码:1466 / 1470
页数:5
相关论文
共 50 条
  • [1] Curriculum Pre-training for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Zhou, Ming
    Yang, Zhenglu
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3728 - 3738
  • [2] Self-Supervised Representations Improve End-to-End Speech Translation
    Wu, Anne
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    [J]. INTERSPEECH 2020, 2020, : 1491 - 1495
  • [3] A comparison of supervised and unsupervised pre-training of end-to-end models
    Misra, Ananya
    Hwang, Dongseong
    Huo, Zhouyuan
    Garg, Shefali
    Siddhartha, Nikhil
    Narayanan, Arun
    Sim, Khe Chai
    [J]. INTERSPEECH 2021, 2021, : 731 - 735
  • [4] Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Yang, Zhenglu
    Zhou, Ming
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9161 - 9168
  • [5] Self-Training for End-to-End Speech Translation
    Pino, Juan
    Xu, Qiantong
    Ma, Xutai
    Dousti, Mohammad Javad
    Tang, Yun
    [J]. INTERSPEECH 2020, 2020, : 1476 - 1480
  • [6] Speech Model Pre-training for End-to-End Spoken Language Understanding
    Lugosch, Loren
    Ravanelli, Mirco
    Ignoto, Patrick
    Tomar, Vikrant Singh
    Bengio, Yoshua
    [J]. INTERSPEECH 2019, 2019, : 814 - 818
  • [7] AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR END-TO-END SPEECH RECOGNITION
    Chang, Xuankai
    Maekaku, Takashi
    Guo, Pengcheng
    Shi, Jing
    Lu, Yen-Ju
    Subramanian, Aswin Shanmugam
    Wang, Tianzi
    Yang, Shu-wen
    Tsao, Yu
    Lee, Hung-yi
    Watanabe, Shinji
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 228 - 235
  • [8] Reducing Domain mismatch in Self-supervised speech pre-training
    Baskar, Murali Karthick
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Zhang, Yu
    [J]. INTERSPEECH 2022, 2022, : 3028 - 3032
  • [9] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3819 - 3823
  • [10] Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
    Popuri, Sravya
    Chen, Peng-Jen
    Wang, Changhan
    Pino, Juan
    Adi, Yossi
    Gu, Jiatao
    Hsu, Wei-Ning
    Lee, Ann
    [J]. INTERSPEECH 2022, 2022, : 5195 - 5199