Textless Speech-to-Speech Translation on Real Data

被引：0

作者：

Lee, Ann ^{[1
]}

Gong, Hongyu ^{[1
]}

Duquenne, Paul-Ambroise ^{[1
]}

Schwenk, Holger ^{[1
]}

Chen, Peng-Jen ^{[1
]}

Wang, Changhan ^{[1
]}

Popuri, Sravya ^{[1
]}

Adi, Yossi ^{[1
]}

Pino, Juan ^{[1
]}

Gu, Jiatao ^{[1
]}

Hsu, Wei-Ning ^{[1
]}

机构：

[1] Meta AI, Menlo Pk, CA 94025 USA

来源：

NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents, while preserving the lexical content. With only 10 minutes of paired data for speech normalization, we obtain on average 3.2 BLEU gain when training the S2ST model on the VoxPopuli S2ST dataset, compared to a baseline trained on unnormalized speech target. We also incorporate automatically mined S2ST data and show an additional 2.0 BLEU gain. To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs(1).

引用

页码：860 / 872

页数：13

共 50 条

[31] CORBA-based speech-to-speech translation system
Gruhn, R
Takashima, K
Nishino, A
Nakamura, S
[J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 355 - 358
[32] Speech-to-speech translation services for the Olympic Games 2008
Stueker, Sebastian
Zong, Chengqing
Reichert, Juergen
Cao, Wenjie
Kolss, Muntsin
Xie, Guodong
Peterson, Kay
Ding, Peng
Arranz, Victoria
Yu, Jian
Waibel, Alex
[J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 297 - +
[33] A hand-held speech-to-speech translation system
Zhou, BW
Gao, YQ
Sorensen, J
Déchelotte, D
Picheny, M
[J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 664 - 669
[34] Speech-to-speech translation software on PDAs for travel conversation
Isotani, Ryosuke
Yamabana, Kiyoshi
Ando, Shinichi
Hanazawa, Ken
Ishikawa, Shin-Ya
Iso, Ken-Ichi
[J]. NEC Research and Development, 2003, 44 (SPEC.): : 197 - 202
[35] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Jia, Ye
Ramanovich, Michelle Tadmor
Wang, Quan
Zen, Heiga
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6691 - 6703
[36] Rhonda: the architecture of a multilingual speech-to-speech translation pipeline
Louw, Johannes A.
Moodley, Avashlin
[J]. 2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AND INNOVATIVE COMPUTING APPLICATIONS (ICONIC), 2018, : 194 - 200
[37] Speech-to-speech translation software on PDAs for travel conversation
Isotani, R
Yamabana, K
Ando, S
Hanazawa, K
Ishikawa, S
Iso, K
[J]. NEC RESEARCH & DEVELOPMENT, 2003, 44 (02): : 197 - 202
[38] Multilingual Web Conferencing Using Speech-to-Speech Translation
Chen, John
Wen, Shufei
Sridhar, Vivek Kumar Rangarajan
Bangalore, Srinivas
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1860 - 1862
[39] TECNOPARLA - Speech technologies for Catalan and its application to Speech-to-speech Translation
Schulz, Henrik
Costa-Jussa, Marta R.
Fonollosa, Jose A. R.
[J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 319 - 320
[40] NAME AWARE SPEECH-TO-SPEECH TRANSLATION FOR ENGLISH/IRAQI
Prasad, Rohit
Moran, Christine
Choi, Fred
Meermeier, Ralf
Saleem, Shirin
Kao, Chia-lin
Stallard, Dave
Natarajan, Prem
[J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 249 - 252

← 1 2 3 4 5 →