Cascade or Direct Speech Translation? A Case Study

被引:2
|
作者
Etchegoyhen, Thierry [1 ]
Arzelus, Haritz [1 ]
Gete, Harritxu [1 ,2 ]
Alvarez, Aitor [1 ]
Torre, Ivan G. [1 ]
Martin-Donas, Juan Manuel [1 ]
Gonzalez-Docasal, Ander [1 ,3 ]
Fernandez, Edson Benites [4 ]
机构
[1] Vicomtech Fdn, Basque Res & Technol Alliance BRTA, Donostia San Sebastian 20009, Spain
[2] Univ Basque Country, Fac Informat, Donostia San Sebastian 20018, Spain
[3] Univ Zaragoza, Sch Engn & Architecture, Zaragoza 50018, Spain
[4] Vicomtech, Donostia San Sebastian 20009, Spain
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 03期
关键词
speech translation; Basque; Spanish; corpus; cascade speech translation; direct speech translation;
D O I
10.3390/app12031097
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speech translation has been traditionally tackled under a cascade approach, chaining speech recognition and machine translation components to translate from an audio source in a given language into text or speech in a target language. Leveraging on deep learning approaches to natural language processing, recent studies have explored the potential of direct end-to-end neural modelling to perform the speech translation task. Though several benefits may come from end-to-end modelling, such as a reduction in latency and error propagation, the comparative merits of each approach still deserve detailed evaluations and analyses. In this work, we compared state-of-the-art cascade and direct approaches on the under-resourced Basque-Spanish language pair, which features challenging phenomena such as marked differences in morphology and word order. This case study thus complements other studies in the field, which mostly revolve around the English language. We describe and analysed in detail the mintzai-ST corpus, prepared from the sessions of the Basque Parliament, and evaluated the strengths and limitations of cascade and direct speech translation models trained on this corpus, with variants exploiting additional data as well. Our results indicated that, despite significant progress with end-to-end models, which may outperform alternatives in some cases in terms of automated metrics, a cascade approach proved optimal overall in our experiments and manual evaluations.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Streaming cascade-based speech translation leveraged by a direct segmentation model
    Iranzo-Sánchez, Javier
    Jorge, Javier
    Baquero-Arnal, Pau
    Silvestre-Cerdà, Joan Albert
    Giménez, Adrià
    Civera, Jorge
    Sanchis, Albert
    Juan, Alfons
    [J]. Neural Networks, 2021, 142 : 303 - 315
  • [2] Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
    Bentivogli, Luisa
    Cettolo, Mauro
    Gaido, Marco
    Karakanta, Alina
    Martinelli, Alberto
    Negri, Matteo
    Turchi, Marco
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2873 - 2887
  • [3] Cascade Speech Translation for the Kazakh Language
    Kozhirbayev, Zhanibek
    Islamgozhayev, Talgat
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (15):
  • [4] A Faster Approach For Direct Speech to Speech Translation
    Shankarappa, Rashmi T.
    Tiwari, Sourabh
    [J]. 2022 IEEE WOMEN IN TECHNOLOGY CONFERENCE (WINTECHCON): SMARTER TECHNOLOGIES FOR A SUSTAINABLE AND HYPER-CONNECTED WORLD, 2022,
  • [5] Direct Speech-to-Speech Translation With Discrete Units
    Lee, Ann
    Chen, Peng-Jen
    Wang, Changhan
    Gu, Jiatao
    Popuri, Sravya
    Ma, Xutai
    Polyak, Adam
    Adi, Yossi
    He, Qing
    Tang, Yun
    Pino, Juan
    Hsu, Wei-Ning
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
  • [6] Direct Speech-to-Image Translation
    Li, Jiguo
    Zhang, Xinfeng
    Jia, Chuanmin
    Xu, Jizheng
    Zhang, Li
    Wang, Yue
    Ma, Siwei
    Gao, Wen
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 517 - 529
  • [7] Direct Speech Translation for Automatic Subtitling
    Papi, Sara
    Gaido, Marco
    Karakanta, Alina
    Cettolo, Mauro
    Negri, Matteo
    Turchi, Marco
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1355 - 1376
  • [8] On the Locality of Attention in Direct Speech Translation
    Alastruey, Belen
    Ferrando, Javier
    Gallego, Gerard, I
    Costa-jussa, Marta R.
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 402 - 412
  • [9] Direct Segmentation Models for Streaming Speech Translation
    Iranzo-Sanchez, Javier
    Pastor, Adria Gimenez
    Silvestre-Cerda, Joan Albert
    Baquero-Arnal, Pau
    Civera, Jorge
    Juan, Alfons
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2599 - 2611
  • [10] A Study on Translation of University Inspirational Speech In Light of Skopos Theory:A Case Study of 2014 AmericanCommencement Speech
    李娜
    何泠静
    [J]. 海外英语, 2016, (02) : 95 - 98