End-to-End Speech Translation with Knowledge Distillation

被引:55
|
作者
Liu, Yuchen [1 ,2 ]
Xiong, Hao [4 ]
Zhang, Jiajun [1 ,2 ]
He, Zhongjun [4 ]
Wu, Hua [4 ]
Wang, Haifeng [4 ]
Zong, Chengqing [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, NLPR, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] Baidu Inc, 10,Shangdi 10th St, Beijing, Peoples R China
来源
关键词
Speech recognition; Speech translation; Knowledge distillation; Transformer;
D O I
10.21437/Interspeech.2019-2582
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end speech translation (ST), which directly translates from source language speech into target language text, has attracted intensive attentions in recent years. Compared to conventional pipeline systems, end-to-end ST model has potential benefits of lower latency, smaller model size and less error propagation. However, it is notoriously difficult to implement such model which combines automatic speech recognition (ASR) and machine translation (MT) together. In this paper, we propose a knowledge distillation approach to improve ST by transferring the knowledge from text translation. Specifically, we first train a text translation model, regarded as the teacher model, and then ST model is trained to learn the output probabilities of teacher model through knowledge distillation. Experiments on English-French Augmented LibriSpeech and English-Chinese TED corpus show that end-to-end ST is possible to implement on both similar and dissimilar language pairs. In addition, with the instruction of the teacher model, end-toend ST model can gain significant improvements by over 3.5 BLEU points.
引用
收藏
页码:1128 / 1132
页数:5
相关论文
共 50 条
  • [1] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
    Inaguma, Hirofumi
    Kawahara, Tatsuya
    Watanabe, Shinji
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1872 - 1881
  • [2] End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
    Gaido, Marco
    Di Gangi, Mattia Antonino
    Negri, Matteo
    Turchi, Marco
    [J]. 17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 80 - 88
  • [3] TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition
    Yoon, Ji Won
    Lee, Hyeonseung
    Kim, Hyung Yong
    Cho, Won Ik
    Kim, Nam Soo
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1626 - 1638
  • [4] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription
    Lin, Yuqin
    Wang, Longbiao
    Li, Sheng
    Dang, Jianwu
    Ding, Chenchen
    [J]. INTERSPEECH 2020, 2020, : 4791 - 4795
  • [5] MULTILINGUAL END-TO-END SPEECH TRANSLATION
    Inaguma, Hirofumi
    Duh, Kevin
    Kawahara, Tatsuya
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
  • [6] End-to-end spoofing speech detection and knowledge distillation under noisy conditions
    Liu, Pengfei
    Zhang, Zhenchuan
    Yang, Yingchun
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [7] End-to-End Speech Translation for Code Switched Speech
    Weller, Orion
    Sperber, Matthias
    Pires, Telmo
    Setiawan, Hendra
    Gollan, Christian
    Telaar, Dominic
    Paulik, Matthias
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
  • [8] End-to-End Speech Translation with Adversarial Training
    Li, Xuancai
    Chen, Kehai
    Zhao, Tiejun
    Yang, Muyun
    [J]. WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
  • [9] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
    Berard, Alexandre
    Besacier, Laurent
    Kocabiyikoglu, Ali Can
    Pietquin, Olivier
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228
  • [10] Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition
    Kurata, Gakuto
    Saon, George
    [J]. INTERSPEECH 2020, 2020, : 2117 - 2121