Cascade or Direct Speech Translation? A Case Study

被引:2
|
作者
Etchegoyhen, Thierry [1 ]
Arzelus, Haritz [1 ]
Gete, Harritxu [1 ,2 ]
Alvarez, Aitor [1 ]
Torre, Ivan G. [1 ]
Martin-Donas, Juan Manuel [1 ]
Gonzalez-Docasal, Ander [1 ,3 ]
Fernandez, Edson Benites [4 ]
机构
[1] Vicomtech Fdn, Basque Res & Technol Alliance BRTA, Donostia San Sebastian 20009, Spain
[2] Univ Basque Country, Fac Informat, Donostia San Sebastian 20018, Spain
[3] Univ Zaragoza, Sch Engn & Architecture, Zaragoza 50018, Spain
[4] Vicomtech, Donostia San Sebastian 20009, Spain
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 03期
关键词
speech translation; Basque; Spanish; corpus; cascade speech translation; direct speech translation;
D O I
10.3390/app12031097
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speech translation has been traditionally tackled under a cascade approach, chaining speech recognition and machine translation components to translate from an audio source in a given language into text or speech in a target language. Leveraging on deep learning approaches to natural language processing, recent studies have explored the potential of direct end-to-end neural modelling to perform the speech translation task. Though several benefits may come from end-to-end modelling, such as a reduction in latency and error propagation, the comparative merits of each approach still deserve detailed evaluations and analyses. In this work, we compared state-of-the-art cascade and direct approaches on the under-resourced Basque-Spanish language pair, which features challenging phenomena such as marked differences in morphology and word order. This case study thus complements other studies in the field, which mostly revolve around the English language. We describe and analysed in detail the mintzai-ST corpus, prepared from the sessions of the Basque Parliament, and evaluated the strengths and limitations of cascade and direct speech translation models trained on this corpus, with variants exploiting additional data as well. Our results indicated that, despite significant progress with end-to-end models, which may outperform alternatives in some cases in terms of automated metrics, a cascade approach proved optimal overall in our experiments and manual evaluations.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Translation, direct quotation and decontextualisation (Reported speech, process of translation, cultural criteria)
    Slembrouck, S
    [J]. PERSPECTIVES-STUDIES IN TRANSLATION THEORY AND PRACTICE, 1999, 7 (01): : 81 - 108
  • [22] Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation
    Han, Yuchen
    Xu, Chen
    Xiao, Tong
    Zhu, Jingbo
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1340 - 1348
  • [23] Comparative study on corpora for speech translation
    Kikui, Genichiro
    Yamamoto, Seiichi
    Takezawa, Toshiyuki
    Sumita, Eiichiro
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1674 - 1682
  • [24] UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
    Inaguma, Hirofumi
    Popuri, Sravya
    Kulikov, Ilia
    Chen, Peng-Jen
    Wang, Changhan
    Chung, Yu-An
    Tang, Yun
    Lee, Ann
    Watanabe, Shinji
    Pino, Juan
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15655 - 15680
  • [25] Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
    Dong, Qianqian
    Yue, Fengpeng
    Ko, Tom
    Wang, Mingxuan
    Bai, Qibing
    Zhang, Yu
    [J]. INTERSPEECH 2022, 2022, : 1781 - 1785
  • [26] INSTANCE-BASED MODEL ADAPTATION FOR DIRECT SPEECH TRANSLATION
    Di Gangi, Mattia A.
    Viet-Nhat Nguyen
    Negri, Matteo
    Turchi, Marco
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7914 - 7918
  • [27] Direct Text to Speech Translation System Using Acoustic Units
    Mingote, Victoria
    Gimeno, Pablo
    Vicente, Luis
    Khurana, Sameer
    Laurent, Antoine
    Duret, Jarod
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1262 - 1266
  • [28] Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
    Jia, Ye
    Ramanovich, Michelle Tadmor
    Remez, Tal
    Pomerantz, Roi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10120 - 10134
  • [29] Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
    Jia, Ye
    Ding, Yifan
    Bapna, Ankur
    Cherry, Colin
    Zhang, Yu
    Conneau, Alexis
    Morioka, Nobuyuki
    [J]. INTERSPEECH 2022, 2022, : 1721 - 1725
  • [30] Impacts of machine translation and speech synthesis on speech-to-speech translation
    Hashimoto, Kei
    Yamagishi, Junichi
    Byrne, William
    King, Simon
    Tokuda, Keiichi
    [J]. SPEECH COMMUNICATION, 2012, 54 (07) : 857 - 866