Streaming Models for Joint Speech Recognition and Translation

被引：0

作者：

Weller, Orion ^{[1
]}

Sperber, Matthias ^{[2
]}

Gollan, Christian ^{[2
]}

Kluivers, Joris ^{[2
]}

机构：

[1] Brigham Young Univ, Provo, UT 84602 USA

[2] Apple, Cupertino, CA USA

来源：

16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Using end-to-end models for speech translation (ST) has increasingly been the focus of the ST community. These models condense the previously cascaded systems by directly converting sound waves into translated text. However, cascaded models have the advantage of including automatic speech recognition output, useful for a variety of practical ST systems that often display transcripts to the user alongside the translations. To bridge this gap, recent work has shown initial progress into the feasibility for end-to-end models to produce both of these outputs. However, all previous work has only looked at this problem from the consecutive perspective, leaving uncertainty on whether these approaches are effective in the more challenging streaming setting. We develop an end-to-end streaming ST model based on a re-translation approach and compare against standard cascading approaches. We also introduce a novel inference method for the joint case, interleaving both transcript and translation in generation and removing the need to use separate decoders. Our evaluation across a range of metrics capturing accuracy, latency, and consistency shows that our end-to-end models are statistically similar to cascading models, while having half the number of parameters. We also find that both systems provide strong translation quality at low latency, keeping 99% of consecutive quality at a lag of just under a second.

引用

页码：2533 / 2539

页数：7

共 50 条

[1] STREAMING JOINT SPEECH RECOGNITION AND DISFLUENCY DETECTION
Futami, Hayato
Tsunoo, Emiru
Shibata, Kentaro
Kashiwagi, Yosuke
Okuda, Takao
Arora, Siddhant
Watanabe, Shinji
[J]. arXiv, 2022,
[2] Direct Segmentation Models for Streaming Speech Translation
Iranzo-Sanchez, Javier
Pastor, Adria Gimenez
Silvestre-Cerda, Joan Albert
Baquero-Arnal, Pau
Civera, Jorge
Juan, Alfons
[J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2599 - 2611
[3] Joint Speech Translation and Named Entity Recognition
Gaido, Marco
Papi, Sara
Negri, Matteo
Turchi, Marco
[J]. INTERSPEECH 2023, 2023, : 47 - 51
[4] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 936 - 943
[5] Joint streaming model for backchannel prediction and automatic speech recognition
Choi, Yong-Seok
Bang, Jeong-Uk
Kim, Seung Hi
[J]. ETRI JOURNAL, 2024, 46 (01) : 118 - 126
[6] Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Lu, Liang
Kanda, Naoyuki
Li, Jinyu
Gong, Yifan
[J]. INTERSPEECH 2021, 2021, : 1782 - 1786
[7] A COMPARISON OF STREAMING MODELS AND DATA AUGMENTATION METHODS FOR ROBUST SPEECH RECOGNITION
Kim, Jiyeon
Kumar, Mehul
Gowda, Dhananjaya
Garg, Abhinav
Kim, Chanwoo
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 989 - 995
[8] JOINT LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING
Bayer, Ali Orkan
Riccardi, Giuseppe
[J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 199 - 203
[9] Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Zhang, C.
Li, B.
Sainath, T. N.
Strohman, T.
Mavandadi, S.
Chang, S.
Haghani, P.
[J]. INTERSPEECH 2022, 2022, : 3223 - 3227
[10] VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation
Wang, Tianrui
Zhou, Long
Zhang, Ziqiang
Wu, Yu
Liu, Shujie
Gaur, Yashesh
Chen, Zhuo
Li, Jinyu
Wei, Furu
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3709 - 3716

← 1 2 3 4 5 →