The ATR multilingual speech-to-speech translation system

被引:62
|
作者
Nakamura, S [1 ]
Markov, K [1 ]
Nakaiwa, H [1 ]
Kikui, G [1 ]
Kawai, H [1 ]
Jitsuhiro, T [1 ]
Zhang, JS [1 ]
Yamamoto, H [1 ]
Sumita, E [1 ]
Yamamoto, S [1 ]
机构
[1] ATR Spoken Language Translat Res Labs, Kyoto 6190288, Japan
关键词
example-based machine translation (EBMT); minimum description length (MDL); multiclass language model; speech-to-speech translation (S2S); statistical machine translation (SMT); successive state splitting (SSS); text-to-speech (TTS) conversion;
D O I
10.1109/TSA.2005.860774
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we describe the ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages (Japanese and Chinese). There are three main modules of our S2ST system: large-vocabulary continuous speech recognition, machine text-to-text (T2T) translation, and text-to-speech synthesis. All of them are multilingual and are designed using state-of-the-art technologies developed at ATR. A corpus-based statistical machine learning framework forms the basis of our system design. We use a parallel multilingual database consisting of over 600 000 sentences that cover a broad range of travel-related conversations. Recent evaluation of the overall system showed that speech-to-speech translation quality is high, being at the level of a person having a Test of English for International Communication (TOEIC) score of 750 out of the perfect score of 990.
引用
收藏
页码:365 / 376
页数:12
相关论文
共 50 条
  • [1] Multilingual speech-to-speech translation system: VoiceTra
    Matsuda, Shigeki
    Hu, Xinhui
    Shiga, Yoshinori
    Kashioka, Hideki
    Hori, Chiori
    Yasuda, Keiji
    Okuma, Hideo
    Uchiyama, Masao
    Sumita, Eiichiro
    Kawai, Hisashi
    Nakamura, Satoshi
    [J]. 2013 IEEE 14TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2013), VOL 2, 2013, : 229 - 233
  • [2] Multilingual Speech-to-Speech Translation System for Mobile Consumer Devices
    Yun, Seung
    Lee, Young-Jik
    Kim, Sang-Hun
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2014, 60 (03) : 508 - 516
  • [3] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
    Jia, Ye
    Ramanovich, Michelle Tadmor
    Wang, Quan
    Zen, Heiga
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6691 - 6703
  • [4] Multilingual Web Conferencing Using Speech-to-Speech Translation
    Chen, John
    Wen, Shufei
    Sridhar, Vivek Kumar Rangarajan
    Bangalore, Srinivas
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1860 - 1862
  • [5] Rhonda: the architecture of a multilingual speech-to-speech translation pipeline
    Louw, Johannes A.
    Moodley, Avashlin
    [J]. 2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AND INNOVATIVE COMPUTING APPLICATIONS (ICONIC), 2018, : 194 - 200
  • [6] Developing high performance ASR in the IBM multilingual speech-to-speech translation system
    Cui, Xiaodong
    Gu, Liang
    Xiang, Bing
    Zhang, Wei
    Gao, Yuqing
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5121 - 5124
  • [7] NICT/ATR Chinese-Japanese-English Speech-to-Speech Translation System
    Tohru Shimizu
    Yutaka Ashikari
    Eiichiro Sumita
    张劲松
    Satoshi Nakamura
    [J]. Tsinghua Science and Technology, 2008, (04) : 540 - 544
  • [8] The NESPOLE! speech-to-speech translation system
    Lavie, A
    Levin, L
    Frederking, R
    Pianesi, F
    [J]. MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 240 - 243
  • [9] Generating Arabic text in multilingual speech-to-speech machine translation framework
    Monem, Azza Abdel
    Shaalan, Khaled
    Rafea, Ahmed
    Baraka, Hoda
    [J]. MACHINE TRANSLATION, 2008, 22 (04) : 205 - 258
  • [10] AN ANALYSIS OF MACHINE TRANSLATION AND SPEECH SYNTHESIS IN SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Hashimoto, Kei
    Yamagishi, Junichi
    Byrne, William
    King, Simon
    Tokuda, Keiichi
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5108 - 5111