Direct Segmentation Models for Streaming Speech Translation

被引:0
|
作者
Iranzo-Sanchez, Javier [1 ]
Pastor, Adria Gimenez [1 ]
Silvestre-Cerda, Joan Albert [1 ]
Baquero-Arnal, Pau [1 ]
Civera, Jorge [1 ]
Juan, Alfons [1 ]
机构
[1] Univ Politcn Valncia, Machine Learning & Language Proc MLLP Res Grp, Valencian Res Inst Artificial Intelligence VRAIN, Valencia, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.
引用
收藏
页码:2599 / 2611
页数:13
相关论文
共 50 条
  • [1] Streaming cascade-based speech translation leveraged by a direct segmentation model
    Iranzo-Sánchez, Javier
    Jorge, Javier
    Baquero-Arnal, Pau
    Silvestre-Cerdà, Joan Albert
    Giménez, Adrià
    Civera, Jorge
    Sanchis, Albert
    Juan, Alfons
    [J]. Neural Networks, 2021, 142 : 303 - 315
  • [2] Streaming Models for Joint Speech Recognition and Translation
    Weller, Orion
    Sperber, Matthias
    Gollan, Christian
    Kluivers, Joris
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2533 - 2539
  • [3] CASCADED MODELS WITH CYCLIC FEEDBACK FOR DIRECT SPEECH TRANSLATION
    Lam, Tsz Kin
    Schamoni, Shigehiko
    Riezler, Stefan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7508 - 7512
  • [4] Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
    Chen, Junkun
    Ma, Mingbo
    Zheng, Renjie
    Huang, Liang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4618 - 4624
  • [5] Automatic Speech Segmentation for Automatic Speech Translation
    Klosowski, Piotr
    Dustor, Adam
    [J]. COMPUTER NETWORKS, CN 2013, 2013, 370 : 466 - 475
  • [6] Segmentation-Free Streaming Machine Translation
    Iranzo-Sanchez, Javier
    Iranzo-Sanchez, Jorge
    Gimenez, Adria
    Civera, Jorge
    Juan, Alfons
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1104 - 1121
  • [7] A Faster Approach For Direct Speech to Speech Translation
    Shankarappa, Rashmi T.
    Tiwari, Sourabh
    [J]. 2022 IEEE WOMEN IN TECHNOLOGY CONFERENCE (WINTECHCON): SMARTER TECHNOLOGIES FOR A SUSTAINABLE AND HYPER-CONNECTED WORLD, 2022,
  • [8] STREAMING SIMULTANEOUS SPEECH TRANSLATION WITH AUGMENTED MEMORY TRANSFORMER
    Ma, Xutai
    Wang, Yongqiang
    Dousti, Mohammad Javad
    Koehn, Philipp
    Pino, Juan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7523 - 7527
  • [9] Direct Speech-to-Speech Translation With Discrete Units
    Lee, Ann
    Chen, Peng-Jen
    Wang, Changhan
    Gu, Jiatao
    Popuri, Sravya
    Ma, Xutai
    Polyak, Adam
    Adi, Yossi
    He, Qing
    Tang, Yun
    Pino, Juan
    Hsu, Wei-Ning
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
  • [10] Segmentation and Disfluency Removal for Conversational Speech Translation
    Hassan, Hany
    Schwartz, Lee
    Hakkani-Tur, Dilek
    Tur, Gokhan
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 318 - 322