Incremental TTS for Japanese Language

被引:4
|
作者
Yanagita, Tomoya [1 ]
Sakti, Sakriani [1 ,2 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Japan
[2] RIKEN, Ctr Adv Intelligence Project AIP, Wako, Saitama, Japan
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
Incremental speech synthesis; linguistic and temporal locality features; HMM based speech synthesis;
D O I
10.21437/Interspeech.2018-1561
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Simultaneous lecture translation requires speech to be translated in real time before the speaker has spoken an entire sentence since a long delay will create difficulties for the listeners trying to follow the lecture. The challenge is to construct a full-fledged system with speech recognition, machine translation, and text-to-speech synthesis (TTS) components that could produce high quality speech translations on the fly. Specifically for a TTS, this poses problems as a conventional framework commonly requires the language-dependent contextual linguistics of a full sentence to produce a natural-sounding speech waveform. Several studies have proposed ways for an incremental TTS (TITS), in which it can estimate the target prosody from only partial knowledge of the sentence. However, most investigations are being done only in French, English, and German. French is a syllable-timed language and the others are stress-timed languages. The Japanese language, which is a mora-timed language, has not been investigated so far. In this paper, we evaluate the quality of Japanese synthesized speech based on various linguistic and temporal incremental units. Experimental results reveal that an accent phrase incremental unit (a group of moras) is essential for a Japanese ITTS as a trade-off between quality and synthesis units.
引用
收藏
页码:902 / 906
页数:5
相关论文
共 50 条
  • [41] Japanese Language, Standard Language, National Language: Rethinking Language and Nation
    Culiberg, Luka
    ASIAN STUDIES-AZIJSKE STUDIJE, 2013, 1 (02): : 21 - 33
  • [42] Toward spontaneous speech synthesis-utilizing language model information in TTS
    Werner, S
    Eichner, M
    Wolff, M
    Hoffmann, R
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 436 - 445
  • [43] Improving Bilingual TTS Using Language And Phonology EmbeddingWith Embedding Strength Modulator
    Yang, Fengyu
    Luan, Jian
    Meng, Meng
    Wang, Yujun
    INTERSPEECH 2023, 2023, : 5531 - 5535
  • [44] SIMULTANEOUS SPEECH-TO-SPEECH TRANSLATION SYSTEM WITH TRANSFORMER-BASED INCREMENTAL ASR, MT, AND TTS
    Fukuda, Ryo
    Novitasari, Sashi
    Oka, Yui
    Kano, Yasumasa
    Yano, Yuki
    Ko, Yuka
    Tokuyama, Hirotaka
    Doi, Kosuke
    Yanagita, Tomoya
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 186 - 192
  • [45] A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction
    Takano, S
    Tanaka, K
    Mizuno, H
    Abe, M
    Nakajima, S
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (01): : 3 - 10
  • [46] An Improved Syllabification for a Better Malay Language Text-to-Speech Synthesis (TTS)
    Ramlia, Izzad
    Jamil, Nursuriati
    Seman, Noraini
    Ardi, Norizah
    2015 IEEE INTERNATIONAL SYMPOSIUM ON ROBOTICS AND INTELLIGENT SENSORS (IEEE IRIS2015), 2015, 76 : 417 - 424
  • [47] A comparison between allophone, syllable, and diphone based TTS systems for Azerbaijan language
    Cybernetics Institute, Azerbaijan National Academy of Sciences, 9, F. Agayev str., AZ1141, Baku, Azerbaijan
    Mini EURO Conf. Continuous Optim. Inf.-Based Technol. Financ. Sect., MEC EurOPT, 1600, (300-305):
  • [48] A Comparison Between Allophone, Syllable, and Diphone Based TTS Systems for Kurdish Language
    Barkhoda, Wafa
    ZahirAzami, Bahram
    Bahrampour, Anvar
    Shahryari, Om-Kolsoom
    2009 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2009), 2009, : 557 - +
  • [49] iCoLa: A Compositional Meta-language with Support for Incremental Language Development
    Frolich, Damian
    van Binsbergen, L. Thomas
    PROCEEDINGS OF THE 15TH ACM SIGPLAN INTERNATIONAL CONFERENCE ON SOFTWARE LANGUAGE ENGINEERING, SLE 2022, 2022, : 202 - 215
  • [50] Combining a Declarative Language and an Imperative Language for Bidirectional Incremental Model Transformations
    Bank, Matthias
    Buchmann, Thomas
    Westfechtel, Bernhard
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING AND SOFTWARE DEVELOPMENT (MODELSWARD), 2021, : 15 - 27