Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

被引：1

作者：

Dong, Qianqian ^{[1
]}

Yue, Fengpeng ^{[1
,2
]}

Ko, Tom ^{[1
]}

Wang, Mingxuan ^{[1
]}

Bai, Qibing ^{[1
,2
]}

Zhang, Yu ^{[2
,3
]}

机构：

[1] ByteDance AI Lab, Beijing, Peoples R China

[2] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech translation; speech-to-speech translation; pseudo-labeling;

D O I：

10.21437/Interspeech.2022-10011

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.

引用

页码：1781 / 1785

页数：5

共 50 条

[1] Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
Jia, Ye
Ding, Yifan
Bapna, Ankur
Cherry, Colin
Zhang, Yu
Conneau, Alexis
Morioka, Nobuyuki
[J]. INTERSPEECH 2022, 2022, : 1721 - 1725
[2] Direct Speech-to-Speech Translation With Discrete Units
Lee, Ann
Chen, Peng-Jen
Wang, Changhan
Gu, Jiatao
Popuri, Sravya
Ma, Xutai
Polyak, Adam
Adi, Yossi
He, Qing
Tang, Yun
Pino, Juan
Hsu, Wei-Ning
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
[3] Textless Speech-to-Speech Translation on Real Data
Lee, Ann
Gong, Hongyu
Duquenne, Paul-Ambroise
Schwenk, Holger
Chen, Peng-Jen
Wang, Changhan
Popuri, Sravya
Adi, Yossi
Pino, Juan
Gu, Jiatao
Hsu, Wei-Ning
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 860 - 872
[4] TRANSFORMER-BASED DIRECT SPEECH-TO-SPEECH TRANSLATION WITH TRANSCODER
Kano, Takatomo
Sakti, Sakriani
Nakamura, Satoshi
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 958 - 965
[5] Direct speech-to-speech translation with a sequence-to-sequence model
Jia, Ye
Weiss, Ron J.
Biadsy, Fadi
Macherey, Wolfgang
Johnson, Melvin
Chen, Zhifeng
Wu, Yonghui
[J]. INTERSPEECH 2019, 2019, : 1123 - 1127
[6] Direct Vs Cascaded Speech-to-Speech Translation Using Transformer
Arya, Lalaram
Chowdhury, Amartya Roy
Prasanna, S. R. Mahadeva
[J]. SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 258 - 270
[7] Impacts of machine translation and speech synthesis on speech-to-speech translation
Hashimoto, Kei
Yamagishi, Junichi
Byrne, William
King, Simon
Tokuda, Keiichi
[J]. SPEECH COMMUNICATION, 2012, 54 (07) : 857 - 866
[8] The NESPOLE! speech-to-speech translation system
Lavie, A
Levin, L
Frederking, R
Pianesi, F
[J]. MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 240 - 243
[9] Hierarchical Classification for Speech-to-Speech Translation
Ettelaie, Emil
Georgiou, Panayiotis G.
Narayanan, Shrikanth S.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2534 - 2537
[10] Towards Machine Speech-to-speech Translation
Satoshi, Nakamura
Sudoh, Katsuhito
Sakti, Sakriani
[J]. TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO, 2019, (17): : 81 - 87

← 1 2 3 4 5 →