Direct Segmentation Models for Streaming Speech Translation

被引:0
|
作者
Iranzo-Sanchez, Javier [1 ]
Pastor, Adria Gimenez [1 ]
Silvestre-Cerda, Joan Albert [1 ]
Baquero-Arnal, Pau [1 ]
Civera, Jorge [1 ]
Juan, Alfons [1 ]
机构
[1] Univ Politcn Valncia, Machine Learning & Language Proc MLLP Res Grp, Valencian Res Inst Artificial Intelligence VRAIN, Valencia, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.
引用
收藏
页码:2599 / 2611
页数:13
相关论文
共 50 条
  • [31] Recent Advances in Direct Speech-to-text Translation
    Xu, Chen
    Ye, Rong
    Dong, Qianqian
    Zhao, Chengqi
    Ko, Tom
    Wang, Mingxuan
    Xiao, Tong
    Zhu, Jingbo
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6796 - 6804
  • [32] Speechformer: Reducing Information Loss in Direct Speech Translation
    Papi, Sara
    Gaido, Marco
    Negri, Matteo
    Turchi, Marco
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1698 - 1706
  • [33] DIRECT SPEECH TRANSLATION PROBLEMS IN VIETNAMESE LITERARY TEXTS
    Britov, I., V
    [J]. RUSSIAN JOURNAL OF VIETNAMESE STUDIES-VYETNAMSKIYE ISSLEDOVANIYA, 2018, (01): : 123 - 148
  • [34] Translation, direct quotation and decontextualisation (Reported speech, process of translation, cultural criteria)
    Slembrouck, S
    [J]. PERSPECTIVES-STUDIES IN TRANSLATION THEORY AND PRACTICE, 1999, 7 (01): : 81 - 108
  • [35] Speech segmentation and interpretation using a semantic syntax-directed translation
    De Mori, R.
    Giordana, Attilio
    Laface, Pietro
    [J]. PATTERN RECOGNITION LETTERS, 1982, 1 (02) : 121 - 124
  • [36] Sequence-to-Sequence Models for Emphasis Speech Translation
    Quoc Truong Do
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1873 - 1883
  • [37] How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation
    Gaido, Marco
    Savoldi, Beatrice
    Bentivogli, Luisa
    Negri, Matteo
    Turchi, Marco
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3576 - 3589
  • [38] TRANSLATION OF HELIUM SPEECH BY METHOD OF SEGMENTATION, PARTIAL-REJECTION AND EXPANSION
    SUZUKI, J
    NAKATSUI, M
    TAKASUGI, T
    TANAKA, R
    [J]. JOURNAL OF THE RADIO RESEARCH LABORATORY, 1977, 24 (113): : 1 - 16
  • [39] SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
    Tsiamas, Ioannis
    Gallego, Gerard I.
    Fonollosa, Jose A. R.
    Costa-jussa, Marta R.
    [J]. INTERSPEECH 2022, 2022, : 106 - 110
  • [40] Similarities in fundamental frequency in infant speech segmentation models
    Marklund, Ellen
    Lacerda, Francisco
    Schwarz, Iris-Corinna
    Sundberg, Ulla
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1110 - 1113