Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

被引:0
|
作者
Chen, Junkun [1 ]
Ma, Mingbo [2 ]
Zheng, Renjie [2 ]
Huang, Liang [1 ,2 ]
机构
[1] Oregon State Univ, Corvallis, OR 97331 USA
[2] Baidu Res, Sunnyvale, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and end-to-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-to-De and En-to-Es experiments on the MuST-C dataset demonstrate that our proposed technique achieves substantially better translation quality at similar levels of latency.
引用
收藏
页码:4618 / 4624
页数:7
相关论文
共 50 条
  • [21] CPT: CROSS-MODAL PREFIX-TUNING FOR SPEECH-TO-TEXT TRANSLATION
    Ma, Yukun
    Trung Hieu Nguyen
    Ma, Bin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6217 - 6221
  • [22] ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
    Le, Chenyang
    Qian, Yao
    Zhou, Long
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    Huang, Xuedong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] STREAMING SIMULTANEOUS SPEECH TRANSLATION WITH AUGMENTED MEMORY TRANSFORMER
    Ma, Xutai
    Wang, Yongqiang
    Dousti, Mohammad Javad
    Koehn, Philipp
    Pino, Juan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7523 - 7527
  • [24] Transcribing paralinguistic acoustic cues to target language text in transformer-based speech-to-text translation
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    [J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3976 - 3980
  • [25] Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-based Speech-to-Text Translation
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    [J]. INTERSPEECH 2021, 2021, : 2262 - 2266
  • [26] Comparative Analysis of Models for Neural Machine Speech-to-Text Translation for Turkic State Languages
    Nurmaganbet, Dauren
    Tukeyev, Ualsher
    Shormakova, Assem
    Zhumanov, Zhandos
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 360 - 371
  • [27] Bridging the cross-modal gap using adversarial training for speech-to-text translation
    Zhang, Hao
    Yang, Xukui
    Qu, Dan
    Li, Zhen
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 131
  • [28] M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
    Zhao, Jinming
    Yang, Hao
    Shareghi, Ehsan
    Haffari, Gholamreza
    [J]. INTERSPEECH 2022, 2022, : 111 - 115
  • [29] LEVERAGING WEAKLY SUPERVISED DATA TO IMPROVE END-TO-END SPEECH-TO-TEXT TRANSLATION
    Jia, Ye
    Johnson, Melvin
    Macherey, Wolfgang
    Weiss, Ron J.
    Cao, Yuan
    Chiu, Chung-Cheng
    Ari, Naveen
    Laurenzo, Stella
    Wu, Yonghui
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7180 - 7184
  • [30] Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
    Duquenne, Paul-Ambroise
    Schwenk, Holger
    Sagot, Benoit
    [J]. INTERSPEECH 2023, 2023, : 32 - 36