TASK AWARE MULTI-TASK LEARNING FOR SPEECH TO TEXT TASKS

被引:7
|
作者
Indurthi, Sathish [1 ]
Zaidi, Mohd Abbas [1 ]
Lakumarapu, Nikhil Kumar [1 ]
Lee, Beomseok [1 ]
Han, Hyojung [1 ]
Ahn, Seokchan [1 ]
Kim, Sangha [1 ]
Kim, Chanwoo [1 ]
Hwang, Inchul [1 ]
机构
[1] Samsung Res, Seoul, South Korea
关键词
Speech Translation; Speech Recognition; Task Modulation; Multitask Learning;
D O I
10.1109/ICASSP39728.2021.9414703
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In general, the direct Speech-to-text translation (ST) is jointly trained with Automatic Speech Recognition (ASR), and Machine Translation (MT) tasks. However, the issues with the current joint learning strategies inhibit the knowledge transfer across these tasks. We propose a task modulation network which allows the model to learn task specific features, while learning the shared features simultaneously. This proposed approach removes the need for separate finetuning step resulting in a single model which performs all these tasks. This single model achieves a performance of 28.64 BLEU score on ST MuST-C English-German, WER of 11.61% on ASR TEDLium v3, 23.35 BLEU score on MT WMT'15 English-German task. This sets a new state-of-the-art performance (SOTA) on the ST task while outperforming the existing end-to-end ASR systems.
引用
收藏
页码:7723 / 7727
页数:5
相关论文
共 50 条
  • [41] CoTexT: Multi-task Learning with Code-Text Transformer
    Long Phan
    Hieu Tran
    Le, Daniel
    Hieu Nguyen
    Anibal, James
    Peltekian, Alec
    Ye, Yanfang
    [J]. NLP4PROG 2021: THE 1ST WORKSHOP ON NATURAL LANGUAGE PROCESSING FOR PROGRAMMING (NLP4PROG 2021), 2021, : 40 - 47
  • [42] Power text information extraction based on multi-task learning
    Ji, Xin
    Wu, Tongxin
    Yu, Ting
    Dong, Linxiao
    Chen, Yiting
    Mi, Na
    Zhao, Jiakui
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (08): : 2461 - 2469
  • [43] Multi-task learning using a hybrid representation for text classification
    Guangquan Lu
    Jiangzhang Gan
    Jian Yin
    Zhiping Luo
    Bo Li
    Xishun Zhao
    [J]. Neural Computing and Applications, 2020, 32 : 6467 - 6480
  • [44] Multi-task learning for historical text normalization: Size matters
    Bollmann, Marcel
    Sogaard, Anders
    Bingel, Joachim
    [J]. DEEP LEARNING APPROACHES FOR LOW-RESOURCE NATURAL LANGUAGE PROCESSING (DEEPLO), 2018, : 19 - 24
  • [45] Multi-Task Learning for Text-dependent Speaker Verification
    Chen, Nanxin
    Qian, Yanmin
    Yu, Kai
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
  • [46] MULTI-OBJECTIVE MULTI-TASK LEARNING ON RNNLM FOR SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 197 - 203
  • [47] Multi-task learning using a hybrid representation for text classification
    Lu, Guangquan
    Gan, Jiangzhang
    Yin, Jian
    Luo, Zhiping
    Li, Bo
    Zhao, Xishun
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6467 - 6480
  • [48] Text Augmentation in a Multi-Task View
    Wei, Jason
    Huang, Chengyu
    Xu, Shiqi
    Vosoughi, Soroush
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2888 - 2894
  • [49] Fuzzy Multi-task Learning for Hate Speech Type Identification
    Liu, Han
    Burnap, Pete
    Alorainy, Wafa
    Williams, Matthew L.
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 3006 - 3012
  • [50] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
    Yue, Pengcheng
    Qu, Leyuan
    Zheng, Shukai
    Li, Taihao
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237