Textless Speech-to-Speech Translation on Real Data

被引:0
|
作者
Lee, Ann [1 ]
Gong, Hongyu [1 ]
Duquenne, Paul-Ambroise [1 ]
Schwenk, Holger [1 ]
Chen, Peng-Jen [1 ]
Wang, Changhan [1 ]
Popuri, Sravya [1 ]
Adi, Yossi [1 ]
Pino, Juan [1 ]
Gu, Jiatao [1 ]
Hsu, Wei-Ning [1 ]
机构
[1] Meta AI, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents, while preserving the lexical content. With only 10 minutes of paired data for speech normalization, we obtain on average 3.2 BLEU gain when training the S2ST model on the VoxPopuli S2ST dataset, compared to a baseline trained on unnormalized speech target. We also incorporate automatically mined S2ST data and show an additional 2.0 BLEU gain. To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs(1).
引用
收藏
页码:860 / 872
页数:13
相关论文
共 50 条
  • [31] CORBA-based speech-to-speech translation system
    Gruhn, R
    Takashima, K
    Nishino, A
    Nakamura, S
    [J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 355 - 358
  • [32] Speech-to-speech translation services for the Olympic Games 2008
    Stueker, Sebastian
    Zong, Chengqing
    Reichert, Juergen
    Cao, Wenjie
    Kolss, Muntsin
    Xie, Guodong
    Peterson, Kay
    Ding, Peng
    Arranz, Victoria
    Yu, Jian
    Waibel, Alex
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 297 - +
  • [33] A hand-held speech-to-speech translation system
    Zhou, BW
    Gao, YQ
    Sorensen, J
    Déchelotte, D
    Picheny, M
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 664 - 669
  • [34] Speech-to-speech translation software on PDAs for travel conversation
    Isotani, Ryosuke
    Yamabana, Kiyoshi
    Ando, Shinichi
    Hanazawa, Ken
    Ishikawa, Shin-Ya
    Iso, Ken-Ichi
    [J]. NEC Research and Development, 2003, 44 (SPEC.): : 197 - 202
  • [35] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
    Jia, Ye
    Ramanovich, Michelle Tadmor
    Wang, Quan
    Zen, Heiga
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6691 - 6703
  • [36] Rhonda: the architecture of a multilingual speech-to-speech translation pipeline
    Louw, Johannes A.
    Moodley, Avashlin
    [J]. 2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AND INNOVATIVE COMPUTING APPLICATIONS (ICONIC), 2018, : 194 - 200
  • [37] Speech-to-speech translation software on PDAs for travel conversation
    Isotani, R
    Yamabana, K
    Ando, S
    Hanazawa, K
    Ishikawa, S
    Iso, K
    [J]. NEC RESEARCH & DEVELOPMENT, 2003, 44 (02): : 197 - 202
  • [38] Multilingual Web Conferencing Using Speech-to-Speech Translation
    Chen, John
    Wen, Shufei
    Sridhar, Vivek Kumar Rangarajan
    Bangalore, Srinivas
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1860 - 1862
  • [39] TECNOPARLA - Speech technologies for Catalan and its application to Speech-to-speech Translation
    Schulz, Henrik
    Costa-Jussa, Marta R.
    Fonollosa, Jose A. R.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 319 - 320
  • [40] NAME AWARE SPEECH-TO-SPEECH TRANSLATION FOR ENGLISH/IRAQI
    Prasad, Rohit
    Moran, Christine
    Choi, Fred
    Meermeier, Ralf
    Saleem, Shirin
    Kao, Chia-lin
    Stallard, Dave
    Natarajan, Prem
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 249 - 252