Incremental TTS for Japanese Language

被引：4

作者：

Yanagita, Tomoya ^{[1
]}

Sakti, Sakriani ^{[1
,2
]}

Nakamura, Satoshi ^{[1
,2
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Japan

[2] RIKEN, Ctr Adv Intelligence Project AIP, Wako, Saitama, Japan

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Incremental speech synthesis; linguistic and temporal locality features; HMM based speech synthesis;

D O I：

10.21437/Interspeech.2018-1561

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Simultaneous lecture translation requires speech to be translated in real time before the speaker has spoken an entire sentence since a long delay will create difficulties for the listeners trying to follow the lecture. The challenge is to construct a full-fledged system with speech recognition, machine translation, and text-to-speech synthesis (TTS) components that could produce high quality speech translations on the fly. Specifically for a TTS, this poses problems as a conventional framework commonly requires the language-dependent contextual linguistics of a full sentence to produce a natural-sounding speech waveform. Several studies have proposed ways for an incremental TTS (TITS), in which it can estimate the target prosody from only partial knowledge of the sentence. However, most investigations are being done only in French, English, and German. French is a syllable-timed language and the others are stress-timed languages. The Japanese language, which is a mora-timed language, has not been investigated so far. In this paper, we evaluate the quality of Japanese synthesized speech based on various linguistic and temporal incremental units. Experimental results reveal that an accent phrase incremental unit (a group of moras) is essential for a Japanese ITTS as a trade-off between quality and synthesis units.

引用

页码：902 / 906

页数：5

共 50 条

[31] Incremental processing in a polysynthetic language (Murrinhpatha)
Bruggeman, Laurence
Kidd, Evan
Nordlinger, Rachel
Cutler, Anne
COGNITION, 2025, 257
[32] Incremental language modeling for broadcast news
Ohtsuki, K
Nguyen, L
2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 139 - 144
[33] Speech-rate-variable HMM-based Japanese TTS system
Iwano, K
Yamada, M
Togawa, T
Furui, S
PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 219 - 222
[34] Incremental relevance feedback in Japanese text retrieval
Jones G.
Sakai T.
Kajiura M.
Sumita K.
Information Retrieval, 2000, 2 (4): : 361 - 384
[35] Japanese? Language? and Gender?
Shibamoto-Smith, Janet S.
GENDER AND LANGUAGE, 2021, 15 (04) : 582 - 590
[36] THE LANGUAGE OF JAPANESE TOURISM
MOERAN, B
ANNALS OF TOURISM RESEARCH, 1983, 10 (01) : 93 - 108
[37] UNORTHODOX JAPANESE + LANGUAGE
不详
EAST, 1979, 15 (3-4): : 46 - 49
[38] PROGRESS AND THE JAPANESE LANGUAGE
MORIHARA, Y
CREATIVE COMPUTING, 1984, 10 (08): : 73 - &
[39] A History of the Japanese Language
Mccreary, Don R.
JOURNAL OF SOCIOLINGUISTICS, 2012, 16 (05) : 703 - 704
[40] GIBNEY JAPANESE LANGUAGE
SHIMAMOTO, R
ENCOUNTER, 1975, 44 (05): : 94 - &

← 1 2 3 4 5 →