End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

被引:0
|
作者
Gaido, Marco [1 ,2 ]
Di Gangi, Mattia Antonino [1 ,2 ]
Negri, Matteo [1 ]
Turchi, Marco [1 ]
机构
[1] Fdn Bruno Kessler, Trento, Italy
[2] Univ Trento, Trento, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes FBK's participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems' ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom segmentation or not. We used the provided segmentation. Our system is an end-to-end model based on an adaptation of the Transformer for speech data. Its training process is the main focus of this paper and it is based on: i) transfer learning (ASR pretraining and knowledge distillation), ii) data augmentation (SpecAugment, time stretch and synthetic data), iii) combining synthetic and real data marked as different domains, and iv) multitask learning using the CTC loss. Finally, after the training with word-level knowledge distillation is complete, our ST models are fine-tuned using label smoothed cross entropy. Our best model scored 29 BLEU on the MuST-C En-De test set, which is an excellent result compared to recent papers, and 23.7 BLEU on the same data segmented with VAD, showing the need for researching solutions addressing this specific data condition.
引用
收藏
页码:80 / 88
页数:9
相关论文
共 50 条
  • [21] CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation
    Lei, Yikun
    Xue, Zhengshan
    Sun, Haoran
    Zhao, Xiaohu
    Zhu, Shaolin
    Lin, Xiaodong
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3123 - 3137
  • [22] Knowledge Distillation from Multilingual and Monolingual Teachers for End-to-End Multilingual Speech Recognition
    Xu, Jingyi
    Hou, Junfeng
    Song, Yan
    Guo, Wu
    Dai, Lirong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 844 - 849
  • [23] MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition
    Wu, Xing
    Jin, Yifan
    Wang, Jianjia
    Qian, Quan
    Guo, Yike
    ALGORITHMS, 2022, 15 (05)
  • [24] Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech Recognition
    Kurata, Gakuto
    Saon, George
    INTERSPEECH 2020, 2020, : 2117 - 2121
  • [25] Diverse Knowledge Distillation for End-to-End Person Search
    Zhang, Xinyu
    Wang, Xinlong
    Bian, Jia-Wang
    Shen, Chunhua
    You, Mingyu
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3412 - 3420
  • [26] Efficient yet Competitive Speech Translation: FBK@IWSLT2022
    Gaido, Marco
    Papi, Sara
    Fucci, Dennis
    Fiameni, Giuseppe
    Negri, Matteo
    Turchi, Marco
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 177 - 189
  • [27] MINTZAI: End-to-end Deep Learning for Speech Translation
    Etchegoyhen, Thierry
    Arzelus, Haritz
    Gete, Harritxu
    Alvarez, Aitor
    Hernaez, Inma
    Navas, Eva
    Gonzalez-Docasal, Ander
    Osacar, Jaime
    Benites, Edson
    Ellakuria, Igor
    Calonge, Eusebi
    Martin, Maite
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65): : 97 - 100
  • [28] Adaptive Feature Selection for End-to-End Speech Translation
    Zhang, Biao
    Titov, Ivan
    Haddow, Barry
    Sennrich, Rico
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2533 - 2544
  • [29] Speaker voice normalization for end-to-end speech translation
    Xue, Zhengshan
    Shi, Tingxun
    Zhang, Xiaolei
    Xiong, Deyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [30] SimulSpeech: End-to-End Simultaneous Speech to Text Translation
    Ren, Yi
    Liu, Jinglin
    Tan, Xu
    Zhang, Chen
    Qin, Tao
    Zhao, Zhou
    Liu, Tie-Yan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3787 - 3796