Cascade or Direct Speech Translation? A Case Study

被引：2

作者：

Etchegoyhen, Thierry ^{[1
]}

Arzelus, Haritz ^{[1
]}

Gete, Harritxu ^{[1
,2
]}

Alvarez, Aitor ^{[1
]}

Torre, Ivan G. ^{[1
]}

Martin-Donas, Juan Manuel ^{[1
]}

Gonzalez-Docasal, Ander ^{[1
,3
]}

Fernandez, Edson Benites ^{[4
]}

机构：

[1] Vicomtech Fdn, Basque Res & Technol Alliance BRTA, Donostia San Sebastian 20009, Spain

[2] Univ Basque Country, Fac Informat, Donostia San Sebastian 20018, Spain

[3] Univ Zaragoza, Sch Engn & Architecture, Zaragoza 50018, Spain

[4] Vicomtech, Donostia San Sebastian 20009, Spain

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 03期

关键词：

speech translation; Basque; Spanish; corpus; cascade speech translation; direct speech translation;

D O I：

10.3390/app12031097

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Speech translation has been traditionally tackled under a cascade approach, chaining speech recognition and machine translation components to translate from an audio source in a given language into text or speech in a target language. Leveraging on deep learning approaches to natural language processing, recent studies have explored the potential of direct end-to-end neural modelling to perform the speech translation task. Though several benefits may come from end-to-end modelling, such as a reduction in latency and error propagation, the comparative merits of each approach still deserve detailed evaluations and analyses. In this work, we compared state-of-the-art cascade and direct approaches on the under-resourced Basque-Spanish language pair, which features challenging phenomena such as marked differences in morphology and word order. This case study thus complements other studies in the field, which mostly revolve around the English language. We describe and analysed in detail the mintzai-ST corpus, prepared from the sessions of the Basque Parliament, and evaluated the strengths and limitations of cascade and direct speech translation models trained on this corpus, with variants exploiting additional data as well. Our results indicated that, despite significant progress with end-to-end models, which may outperform alternatives in some cases in terms of automated metrics, a cascade approach proved optimal overall in our experiments and manual evaluations.

引用

页数：24

共 50 条

[1] Streaming cascade-based speech translation leveraged by a direct segmentation model
Iranzo-Sánchez, Javier
Jorge, Javier
Baquero-Arnal, Pau
Silvestre-Cerdà, Joan Albert
Giménez, Adrià
Civera, Jorge
Sanchis, Albert
Juan, Alfons
[J]. Neural Networks, 2021, 142 : 303 - 315
[2] Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?
Bentivogli, Luisa
Cettolo, Mauro
Gaido, Marco
Karakanta, Alina
Martinelli, Alberto
Negri, Matteo
Turchi, Marco
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2873 - 2887
[3] Cascade Speech Translation for the Kazakh Language
Kozhirbayev, Zhanibek
Islamgozhayev, Talgat
[J]. APPLIED SCIENCES-BASEL, 2023, 13 (15):
[4] A Faster Approach For Direct Speech to Speech Translation
Shankarappa, Rashmi T.
Tiwari, Sourabh
[J]. 2022 IEEE WOMEN IN TECHNOLOGY CONFERENCE (WINTECHCON): SMARTER TECHNOLOGIES FOR A SUSTAINABLE AND HYPER-CONNECTED WORLD, 2022,
[5] Direct Speech-to-Speech Translation With Discrete Units
Lee, Ann
Chen, Peng-Jen
Wang, Changhan
Gu, Jiatao
Popuri, Sravya
Ma, Xutai
Polyak, Adam
Adi, Yossi
He, Qing
Tang, Yun
Pino, Juan
Hsu, Wei-Ning
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
[6] Direct Speech-to-Image Translation
Li, Jiguo
Zhang, Xinfeng
Jia, Chuanmin
Xu, Jizheng
Zhang, Li
Wang, Yue
Ma, Siwei
Gao, Wen
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 517 - 529
[7] Direct Speech Translation for Automatic Subtitling
Papi, Sara
Gaido, Marco
Karakanta, Alina
Cettolo, Mauro
Negri, Matteo
Turchi, Marco
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1355 - 1376
[8] On the Locality of Attention in Direct Speech Translation
Alastruey, Belen
Ferrando, Javier
Gallego, Gerard, I
Costa-jussa, Marta R.
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 402 - 412
[9] Direct Segmentation Models for Streaming Speech Translation
Iranzo-Sanchez, Javier
Pastor, Adria Gimenez
Silvestre-Cerda, Joan Albert
Baquero-Arnal, Pau
Civera, Jorge
Juan, Alfons
[J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2599 - 2611
[10] A Study on Translation of University Inspirational Speech In Light of Skopos Theory:A Case Study of 2014 AmericanCommencement Speech
李娜
何泠静
[J]. 海外英语, 2016, (02) : 95 - 98

← 1 2 3 4 5 →