Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

被引：0

作者：

Chen, Junkun ^{[1
]}

Ma, Mingbo ^{[2
]}

Zheng, Renjie ^{[2
]}

Huang, Liang ^{[1
,2
]}

机构：

[1] Oregon State Univ, Corvallis, OR 97331 USA

[2] Baidu Res, Sunnyvale, CA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and end-to-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-to-De and En-to-Es experiments on the MuST-C dataset demonstrate that our proposed technique achieves substantially better translation quality at similar levels of latency.

引用

页码：4618 / 4624

页数：7

共 50 条

[31] Bridging the cross-modal gap using adversarial training for speech-to-text translation
Zhang, Hao
Yang, Xukui
Qu, Dan
Li, Zhen
[J]. DIGITAL SIGNAL PROCESSING, 2022, 131
[32] MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-to-Text Challenge: Extension
Baquero-Arnal, Pau
Jorge, Javier
Gimenez, Adria
Iranzo-Sanchez, Javier
Perez, Alejandro
Garces Diaz-Munio, Goncal Vicent
Silvestre-Cerda, Joan Albert
Civera, Jorge
Sanchis, Albert
Juan, Alfons
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (02):
[33] END-END SPEECH-TO-TEXT TRANSLATION WITH MODALITY AGNOSTIC META-LEARNING
Indurthi, Sathish
Han, Houjeung
Lakumarapu, Nikhil Kumar
Lee, Beomseok
Chung, Insoo
Kim, Sangha
Kim, Chanwoo
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7904 - 7908
[34] TOWARDS END-TO-END SPEECH-TO-TEXT TRANSLATION WITH TWO-PASS DECODING
Sung, Tzu-Wei
Liu, Jun-You
Lee, Hung-yi
Lee, Lin-shan
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7175 - 7179
[35] SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
Ma, Xutai
Pino, Juan
Koehn, Philipp
[J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 582 - 587
[36] Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation
Dong, Qianqian
Ye, Rong
Wang, Mingxuan
Zhou, Hao
Xu, Shuang
Xu, Bo
Li, Lei
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12749 - 12759
[37] AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation
Huang, Wuwei
Wang, Dexin
Xiong, Deyi
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2539 - 2545
[38] A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge
Perero-Codosero, Juan M.
Espinoza-Cuadros, Fernando M.
Hernandez-Gomez, Luis A.
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (02):
[39] The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
Guo, Bao
Liu, Mengge
Zhang, Wen
Chen, Hexuan
Mu, Chang
Li, Xiang
Cui, Jianwei
Wang, Bin
Guo, Yuhang
[J]. PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 216 - 224
[40] Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
Deng, Keqi
Watanabe, Shinji
Shi, Jiatong
Arora, Siddhant
[J]. INTERSPEECH 2022, 2022, : 1746 - 1750

← 1 2 3 4 5 →