Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

被引:0
|
作者
Chen, Junkun [1 ]
Ma, Mingbo [2 ]
Zheng, Renjie [2 ]
Huang, Liang [1 ,2 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
[2] Baidu Res, Sunnyvale, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and end-to-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-to-De and En-to-Es experiments on the MuST-C dataset demonstrate that our proposed technique achieves substantially better translation quality at similar levels of latency.
引用
收藏
页码:4618 / 4624
页数:7
相关论文
共 50 条
  • [31] Bridging the cross-modal gap using adversarial training for speech-to-text translation
    Zhang, Hao
    Yang, Xukui
    Qu, Dan
    Li, Zhen
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 131
  • [32] MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-to-Text Challenge: Extension
    Baquero-Arnal, Pau
    Jorge, Javier
    Gimenez, Adria
    Iranzo-Sanchez, Javier
    Perez, Alejandro
    Garces Diaz-Munio, Goncal Vicent
    Silvestre-Cerda, Joan Albert
    Civera, Jorge
    Sanchis, Albert
    Juan, Alfons
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (02):
  • [33] END-END SPEECH-TO-TEXT TRANSLATION WITH MODALITY AGNOSTIC META-LEARNING
    Indurthi, Sathish
    Han, Houjeung
    Lakumarapu, Nikhil Kumar
    Lee, Beomseok
    Chung, Insoo
    Kim, Sangha
    Kim, Chanwoo
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7904 - 7908
  • [34] TOWARDS END-TO-END SPEECH-TO-TEXT TRANSLATION WITH TWO-PASS DECODING
    Sung, Tzu-Wei
    Liu, Jun-You
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7175 - 7179
  • [35] SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
    Ma, Xutai
    Pino, Juan
    Koehn, Philipp
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 582 - 587
  • [36] Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation
    Dong, Qianqian
    Ye, Rong
    Wang, Mingxuan
    Zhou, Hao
    Xu, Shuang
    Xu, Bo
    Li, Lei
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12749 - 12759
  • [37] AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
    Huang, Wuwei
    Wang, Dexin
    Xiong, Deyi
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2539 - 2545
  • [38] A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge
    Perero-Codosero, Juan M.
    Espinoza-Cuadros, Fernando M.
    Hernandez-Gomez, Luis A.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (02):
  • [39] The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
    Guo, Bao
    Liu, Mengge
    Zhang, Wen
    Chen, Hexuan
    Mu, Chang
    Li, Xiang
    Cui, Jianwei
    Wang, Bin
    Guo, Yuhang
    [J]. PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 216 - 224
  • [40] Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
    Deng, Keqi
    Watanabe, Shinji
    Shi, Jiatong
    Arora, Siddhant
    [J]. INTERSPEECH 2022, 2022, : 1746 - 1750