Direct Segmentation Models for Streaming Speech Translation

被引:0
|
作者
Iranzo-Sanchez, Javier [1 ]
Pastor, Adria Gimenez [1 ]
Silvestre-Cerda, Joan Albert [1 ]
Baquero-Arnal, Pau [1 ]
Civera, Jorge [1 ]
Juan, Alfons [1 ]
机构
[1] Univ Politcn Valncia, Machine Learning & Language Proc MLLP Res Grp, Valencian Res Inst Artificial Intelligence VRAIN, Valencia, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.
引用
收藏
页码:2599 / 2611
页数:13
相关论文
共 50 条
  • [41] Models of word segmentation in fluent maternal speech to infants
    Aslin, RN
    Woodward, JZ
    LaMendola, NP
    Bever, TG
    [J]. SIGNAL TO SYNTAX: BOOTSTRAPPING FROM SPEECH TO GRAMMAR IN EARLY ACQUISITION, 1996, : 117 - 134
  • [42] UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
    Inaguma, Hirofumi
    Popuri, Sravya
    Kulikov, Ilia
    Chen, Peng-Jen
    Wang, Changhan
    Chung, Yu-An
    Tang, Yun
    Lee, Ann
    Watanabe, Shinji
    Pino, Juan
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15655 - 15680
  • [43] Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
    Dong, Qianqian
    Yue, Fengpeng
    Ko, Tom
    Wang, Mingxuan
    Bai, Qibing
    Zhang, Yu
    [J]. INTERSPEECH 2022, 2022, : 1781 - 1785
  • [44] Direct segmentation of algebraic models for reverse engineering
    Vanco, M
    Brunnett, G
    [J]. COMPUTING, 2004, 72 (1-2) : 207 - 220
  • [45] Direct Segmentation of Algebraic Models for Reverse Engineering
    Marek Vanco
    Guido Brunnett
    [J]. Computing, 2004, 72 : 207 - 220
  • [46] Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
    Xue, Jian
    Wang, Peidong
    Li, Jinyu
    Post, Matt
    Gaur, Yashesh
    [J]. INTERSPEECH 2022, 2022, : 3263 - 3267
  • [47] A COMPARISON OF STREAMING MODELS AND DATA AUGMENTATION METHODS FOR ROBUST SPEECH RECOGNITION
    Kim, Jiyeon
    Kumar, Mehul
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Chanwoo
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 989 - 995
  • [48] Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
    Sklyar, Ilya
    Piunova, Anna
    Osendorfer, Christian
    [J]. INTERSPEECH 2022, 2022, : 4451 - 4455
  • [49] INSTANCE-BASED MODEL ADAPTATION FOR DIRECT SPEECH TRANSLATION
    Di Gangi, Mattia A.
    Viet-Nhat Nguyen
    Negri, Matteo
    Turchi, Marco
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7914 - 7918
  • [50] Direct Text to Speech Translation System Using Acoustic Units
    Mingote, Victoria
    Gimeno, Pablo
    Vicente, Luis
    Khurana, Sameer
    Laurent, Antoine
    Duret, Jarod
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1262 - 1266