Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

被引:0
|
作者
Chen, Junkun [1 ]
Ma, Mingbo [2 ]
Zheng, Renjie [2 ]
Huang, Liang [1 ,2 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
[2] Baidu Res, Sunnyvale, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and end-to-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-to-De and En-to-Es experiments on the MuST-C dataset demonstrate that our proposed technique achieves substantially better translation quality at similar levels of latency.
引用
收藏
页码:4618 / 4624
页数:7
相关论文
共 50 条
  • [1] Recent Advances in Direct Speech-to-text Translation
    Xu, Chen
    Ye, Rong
    Dong, Qianqian
    Zhao, Chengqi
    Ko, Tom
    Wang, Mingxuan
    Xiao, Tong
    Zhu, Jingbo
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6796 - 6804
  • [2] ANALYZING ASR PRETRAINING FOR LOW-RESOURCE SPEECH-TO-TEXT TRANSLATION
    Stoian, Mihaela C.
    Bansal, Sameer
    Goldwater, Sharon
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7909 - 7913
  • [3] NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022
    Fukuda, Ryo
    Ko, Yuka
    Kano, Yasumasa
    Doi, Kosuke
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    [J]. IWSLT 2022 - 19th International Conference on Spoken Language Translation, Proceedings of the Conference, 2022, : 286 - 292
  • [4] NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022
    Fukuda, Ryo
    Ko, Yuka
    Kano, Yasumasa
    Doi, Kosuke
    Tokuyama, Hirotaka
    Saktit, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    [J]. PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 286 - 292
  • [5] Consecutive Decoding for Speech-to-text Translation
    Dong, Qianqian
    Wang, Mingxuan
    Zhou, Hao
    Xu, Shuang
    Xu, Bo
    Li, Lei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12738 - 12748
  • [6] TOWARDS UNSUPERVISED SPEECH-TO-TEXT TRANSLATION
    Chung, Yu-An
    Weng, Wei-Hung
    Tong, Schrasing
    Glass, James
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7170 - 7174
  • [7] Quaero Speech-to-Text and Text Translation Evaluation Systems
    Stueker, Sebastian
    Kilgour, Kevin
    Niehues, Jan
    [J]. HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '10, 2011, : 529 - +
  • [8] Low-Resource Speech-to-Text Translation
    Bansal, Sameer
    Kamper, Herman
    Livescu, Karen
    Lopez, Adam
    Goldwater, Sharon
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1298 - 1302
  • [9] Improved Machine Translation of Speech-to-Text outputs
    Dechelotte, Daniel
    Schwenk, Holger
    Adda, Gilles
    Gauvain, Jean-Luc
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2632 - 2635
  • [10] Back Translation for Speech-to-text TranslationWithout Transcripts
    Fang, Qingkai
    Feng, Yang
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4567 - 4587