Incremental TTS for Japanese Language

被引:4
|
作者
Yanagita, Tomoya [1 ]
Sakti, Sakriani [1 ,2 ]
Nakamura, Satoshi [1 ,2 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Japan
[2] RIKEN, Ctr Adv Intelligence Project AIP, Wako, Saitama, Japan
关键词
Incremental speech synthesis; linguistic and temporal locality features; HMM based speech synthesis;
D O I
10.21437/Interspeech.2018-1561
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Simultaneous lecture translation requires speech to be translated in real time before the speaker has spoken an entire sentence since a long delay will create difficulties for the listeners trying to follow the lecture. The challenge is to construct a full-fledged system with speech recognition, machine translation, and text-to-speech synthesis (TTS) components that could produce high quality speech translations on the fly. Specifically for a TTS, this poses problems as a conventional framework commonly requires the language-dependent contextual linguistics of a full sentence to produce a natural-sounding speech waveform. Several studies have proposed ways for an incremental TTS (TITS), in which it can estimate the target prosody from only partial knowledge of the sentence. However, most investigations are being done only in French, English, and German. French is a syllable-timed language and the others are stress-timed languages. The Japanese language, which is a mora-timed language, has not been investigated so far. In this paper, we evaluate the quality of Japanese synthesized speech based on various linguistic and temporal incremental units. Experimental results reveal that an accent phrase incremental unit (a group of moras) is essential for a Japanese ITTS as a trade-off between quality and synthesis units.
引用
收藏
页码:902 / 906
页数:5
相关论文
共 50 条
  • [1] Japanese subsidiary for TTS
    不详
    NAVAL ARCHITECT, 1999, : 27 - 27
  • [2] THE TTS LANGUAGE FOR MUSIC DESCRIPTION
    BALABAN, M
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1988, 28 (05): : 505 - 523
  • [3] HMM based TTS for Mixed Language Text
    Shuang, Zhiwei
    Kang, Shiyin
    Qin, Yong
    Dai, Lirong
    Cai, Lianhong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 618 - +
  • [4] Modeling Pause Duration for Malayalam Language TTS
    James, Jesin
    Gopinath, Deepa P.
    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 434 - 438
  • [5] What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS
    Stephenson, Brooke
    Besacier, Laurent
    Girin, Laurent
    Hueber, Thomas
    INTERSPEECH 2020, 2020, : 215 - 219
  • [6] POLYPHONE DISAMBIGUATION AND ACCENT PREDICTION USING PRE-TRAINED LANGUAGE MODELS IN JAPANESE TTS FRONT-END
    Hida, Rem
    Hamada, Masaki
    Kamada, Chie
    Tsunoo, Emiru
    Sekiya, Toshiyuki
    Kumakura, Toshiyuki
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7132 - 7136
  • [7] When Is TTS Augmentation Through a Pivot Language Useful?
    Robinson, Nathaniel
    Ogayo, Perez
    Gangu, Swetha
    Mortensen, David R.
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3538 - 3542
  • [8] PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language Model
    Li, Shuhua
    Mao, Qirong
    Shi, Jiatong
    INTERSPEECH 2024, 2024, : 4888 - 4892
  • [9] Incremental English-@Japanese Spoken Language Translation Utilizing Ill-@formed Expressions
    Matsubara, S.
    Asai, S.
    Toyama, K.
    Inagaki, Y.
    Denki Gakkai Ronbunshi. C, Erekutoronikusu Joho Kogaku, Shisutemu, 1998, 118 (01):
  • [10] A Unit Selection Methods using Flexible Break in a Japanese TTS
    Song, Young-Hwan
    Na, Deok-Su
    Kim, Jong-Kuk
    Bae, Myung-Jin
    Lee, Jong-Seok
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2007, 26 (08): : 403 - 408