Direct Segmentation Models for Streaming Speech Translation

被引：0

作者：

Iranzo-Sanchez, Javier ^{[1
]}

Pastor, Adria Gimenez ^{[1
]}

Silvestre-Cerda, Joan Albert ^{[1
]}

Baquero-Arnal, Pau ^{[1
]}

Civera, Jorge ^{[1
]}

Juan, Alfons ^{[1
]}

机构：

[1] Univ Politcn Valncia, Machine Learning & Language Proc MLLP Res Grp, Valencian Res Inst Artificial Intelligence VRAIN, Valencia, Spain

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.

引用

页码：2599 / 2611

页数：13

共 50 条

[1] Streaming cascade-based speech translation leveraged by a direct segmentation model
Iranzo-Sánchez, Javier
Jorge, Javier
Baquero-Arnal, Pau
Silvestre-Cerdà, Joan Albert
Giménez, Adrià
Civera, Jorge
Sanchis, Albert
Juan, Alfons
[J]. Neural Networks, 2021, 142 : 303 - 315
[2] Streaming Models for Joint Speech Recognition and Translation
Weller, Orion
Sperber, Matthias
Gollan, Christian
Kluivers, Joris
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2533 - 2539
[3] CASCADED MODELS WITH CYCLIC FEEDBACK FOR DIRECT SPEECH TRANSLATION
Lam, Tsz Kin
Schamoni, Shigehiko
Riezler, Stefan
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7508 - 7512
[4] Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
Chen, Junkun
Ma, Mingbo
Zheng, Renjie
Huang, Liang
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4618 - 4624
[5] Automatic Speech Segmentation for Automatic Speech Translation
Klosowski, Piotr
Dustor, Adam
[J]. COMPUTER NETWORKS, CN 2013, 2013, 370 : 466 - 475
[6] Segmentation-Free Streaming Machine Translation
Iranzo-Sanchez, Javier
Iranzo-Sanchez, Jorge
Gimenez, Adria
Civera, Jorge
Juan, Alfons
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1104 - 1121
[7] A Faster Approach For Direct Speech to Speech Translation
Shankarappa, Rashmi T.
Tiwari, Sourabh
[J]. 2022 IEEE WOMEN IN TECHNOLOGY CONFERENCE (WINTECHCON): SMARTER TECHNOLOGIES FOR A SUSTAINABLE AND HYPER-CONNECTED WORLD, 2022,
[8] STREAMING SIMULTANEOUS SPEECH TRANSLATION WITH AUGMENTED MEMORY TRANSFORMER
Ma, Xutai
Wang, Yongqiang
Dousti, Mohammad Javad
Koehn, Philipp
Pino, Juan
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7523 - 7527
[9] Direct Speech-to-Speech Translation With Discrete Units
Lee, Ann
Chen, Peng-Jen
Wang, Changhan
Gu, Jiatao
Popuri, Sravya
Ma, Xutai
Polyak, Adam
Adi, Yossi
He, Qing
Tang, Yun
Pino, Juan
Hsu, Wei-Ning
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
[10] Segmentation and Disfluency Removal for Conversational Speech Translation
Hassan, Hany
Schwartz, Lee
Hakkani-Tur, Dilek
Tur, Gokhan
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 318 - 322

← 1 2 3 4 5 →