Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

被引：0

作者：

Chen, Junkun ^{[1
]}

Ma, Mingbo ^{[2
]}

Zheng, Renjie ^{[2
]}

Huang, Liang ^{[1
,2
]}

机构：

[1] Oregon State Univ, Corvallis, OR 97331 USA

[2] Baidu Res, Sunnyvale, CA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and end-to-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-to-De and En-to-Es experiments on the MuST-C dataset demonstrate that our proposed technique achieves substantially better translation quality at similar levels of latency.

引用

页码：4618 / 4624

页数：7

共 50 条

[21] CPT: CROSS-MODAL PREFIX-TUNING FOR SPEECH-TO-TEXT TRANSLATION
Ma, Yukun
Trung Hieu Nguyen
Ma, Bin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6217 - 6221
[22] ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Le, Chenyang
Qian, Yao
Zhou, Long
Liu, Shujie
Qian, Yanmin
Zeng, Michael
Huang, Xuedong
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[23] STREAMING SIMULTANEOUS SPEECH TRANSLATION WITH AUGMENTED MEMORY TRANSFORMER
Ma, Xutai
Wang, Yongqiang
Dousti, Mohammad Javad
Koehn, Philipp
Pino, Juan
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7523 - 7527
[24] Transcribing paralinguistic acoustic cues to target language text in transformer-based speech-to-text translation
Tokuyama, Hirotaka
Sakti, Sakriani
Sudoh, Katsuhito
Nakamura, Satoshi
[J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3976 - 3980
[25] Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-based Speech-to-Text Translation
Tokuyama, Hirotaka
Sakti, Sakriani
Sudoh, Katsuhito
Nakamura, Satoshi
[J]. INTERSPEECH 2021, 2021, : 2262 - 2266
[26] Comparative Analysis of Models for Neural Machine Speech-to-Text Translation for Turkic State Languages
Nurmaganbet, Dauren
Tukeyev, Ualsher
Shormakova, Assem
Zhumanov, Zhandos
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 360 - 371
[27] Bridging the cross-modal gap using adversarial training for speech-to-text translation
Zhang, Hao
Yang, Xukui
Qu, Dan
Li, Zhen
[J]. DIGITAL SIGNAL PROCESSING, 2022, 131
[28] M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
Zhao, Jinming
Yang, Hao
Shareghi, Ehsan
Haffari, Gholamreza
[J]. INTERSPEECH 2022, 2022, : 111 - 115
[29] LEVERAGING WEAKLY SUPERVISED DATA TO IMPROVE END-TO-END SPEECH-TO-TEXT TRANSLATION
Jia, Ye
Johnson, Melvin
Macherey, Wolfgang
Weiss, Ron J.
Cao, Yuan
Chiu, Chung-Cheng
Ari, Naveen
Laurenzo, Stella
Wu, Yonghui
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7180 - 7184
[30] Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
Duquenne, Paul-Ambroise
Schwenk, Holger
Sagot, Benoit
[J]. INTERSPEECH 2023, 2023, : 32 - 36

← 1 2 3 4 5 →