Incremental TTS for Japanese Language

被引：4

作者：

Yanagita, Tomoya ^{[1
]}

Sakti, Sakriani ^{[1
,2
]}

Nakamura, Satoshi ^{[1
,2
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Japan

[2] RIKEN, Ctr Adv Intelligence Project AIP, Wako, Saitama, Japan

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Incremental speech synthesis; linguistic and temporal locality features; HMM based speech synthesis;

D O I：

10.21437/Interspeech.2018-1561

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Simultaneous lecture translation requires speech to be translated in real time before the speaker has spoken an entire sentence since a long delay will create difficulties for the listeners trying to follow the lecture. The challenge is to construct a full-fledged system with speech recognition, machine translation, and text-to-speech synthesis (TTS) components that could produce high quality speech translations on the fly. Specifically for a TTS, this poses problems as a conventional framework commonly requires the language-dependent contextual linguistics of a full sentence to produce a natural-sounding speech waveform. Several studies have proposed ways for an incremental TTS (TITS), in which it can estimate the target prosody from only partial knowledge of the sentence. However, most investigations are being done only in French, English, and German. French is a syllable-timed language and the others are stress-timed languages. The Japanese language, which is a mora-timed language, has not been investigated so far. In this paper, we evaluate the quality of Japanese synthesized speech based on various linguistic and temporal incremental units. Experimental results reveal that an accent phrase incremental unit (a group of moras) is essential for a Japanese ITTS as a trade-off between quality and synthesis units.

引用

页码：902 / 906

页数：5

共 50 条

[41] Japanese Language, Standard Language, National Language: Rethinking Language and Nation
Culiberg, Luka
ASIAN STUDIES-AZIJSKE STUDIJE, 2013, 1 (02): : 21 - 33
[42] Toward spontaneous speech synthesis-utilizing language model information in TTS
Werner, S
Eichner, M
Wolff, M
Hoffmann, R
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 436 - 445
[43] Improving Bilingual TTS Using Language And Phonology EmbeddingWith Embedding Strength Modulator
Yang, Fengyu
Luan, Jian
Meng, Meng
Wang, Yujun
INTERSPEECH 2023, 2023, : 5531 - 5535
[44] SIMULTANEOUS SPEECH-TO-SPEECH TRANSLATION SYSTEM WITH TRANSFORMER-BASED INCREMENTAL ASR, MT, AND TTS
Fukuda, Ryo
Novitasari, Sashi
Oka, Yui
Kano, Yasumasa
Yano, Yuki
Ko, Yuka
Tokuyama, Hirotaka
Doi, Kosuke
Yanagita, Tomoya
Sakti, Sakriani
Sudoh, Katsuhito
Nakamura, Satoshi
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 186 - 192
[45] A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction
Takano, S
Tanaka, K
Mizuno, H
Abe, M
Nakajima, S
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (01): : 3 - 10
[46] An Improved Syllabification for a Better Malay Language Text-to-Speech Synthesis (TTS)
Ramlia, Izzad
Jamil, Nursuriati
Seman, Noraini
Ardi, Norizah
2015 IEEE INTERNATIONAL SYMPOSIUM ON ROBOTICS AND INTELLIGENT SENSORS (IEEE IRIS2015), 2015, 76 : 417 - 424
[47] A comparison between allophone, syllable, and diphone based TTS systems for Azerbaijan language
Cybernetics Institute, Azerbaijan National Academy of Sciences, 9, F. Agayev str., AZ1141, Baku, Azerbaijan
Mini EURO Conf. Continuous Optim. Inf.-Based Technol. Financ. Sect., MEC EurOPT, 1600, (300-305):
[48] A Comparison Between Allophone, Syllable, and Diphone Based TTS Systems for Kurdish Language
Barkhoda, Wafa
ZahirAzami, Bahram
Bahrampour, Anvar
Shahryari, Om-Kolsoom
2009 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2009), 2009, : 557 - +
[49] iCoLa: A Compositional Meta-language with Support for Incremental Language Development
Frolich, Damian
van Binsbergen, L. Thomas
PROCEEDINGS OF THE 15TH ACM SIGPLAN INTERNATIONAL CONFERENCE ON SOFTWARE LANGUAGE ENGINEERING, SLE 2022, 2022, : 202 - 215
[50] Combining a Declarative Language and an Imperative Language for Bidirectional Incremental Model Transformations
Bank, Matthias
Buchmann, Thomas
Westfechtel, Bernhard
PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING AND SOFTWARE DEVELOPMENT (MODELSWARD), 2021, : 15 - 27

← 1 2 3 4 5 →