Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech

被引:1
|
作者
Nguyen Thi Thu Trang [1 ]
Nguyen Hoang Ky [1 ]
Rilliard, Albert [2 ]
d'Alessandro, Christophe [3 ]
机构
[1] Hanoi Univ Sci & Technol, Hanoi, Vietnam
[2] Univ Paris Saclay, CNRS, LISN, Gif Sur Yvette, France
[3] Sorbonne Univ, Inst Jean le Rond dAlembert, UMR7190 CNRS, Paris, France
来源
关键词
Prosody modeling; prosodic boundary; pause prediction; Text-To-Speech; speech synthesis; Vietnamese;
D O I
10.21437/Interspeech.2021-125
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This research aims to build a prosodic boundary prediction model for improving the naturalness of Vietnamese speech synthesis. This model can be used directly to predict prosodic boundaries in the synthesis phase of the statistical parametric or end-to-end speech systems. Beside conventional features related to Part-Of-Speech (POS), this paper proposes two efficient features to predict prosodic boundaries: syntactic blocks and syntactic links, based on a thorough analysis of a Vietnamese dataset. Syntactic blocks are syntactic phrases whose sizes are bounded in their constituent syntactic tree. A syntactic link of two adjacent words is calculated based on the distance between them in the syntax tree. The experimental results show that the two proposed predictors improve the quality of the boundary prediction model using a decision tree classification algorithm, about 36.4% (F1 score) higher than the model with only POS features. The final boundary prediction model with POS, syntactic block, and syntactic link features using the LightGBM algorithm gives the best F1 -score results at 87.0% in test data. The proposed model helps the TTS systems, developed by either HMM-based, DNN-based, or End-to-end speech synthesis techniques, improve about 0.3 MOS points (i.e. 6 to 10%) compared to the ones without the proposed model.
引用
收藏
页码:3885 / 3889
页数:5
相关论文
共 50 条
  • [1] Prosodic boundary prediction model for Vietnamese text-to-speech
    Trang, Nguyen Thi Thu
    Ky, Nguyen Hoang
    Rilliard, Albert
    D'Alessandro, Christophe
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3366 - 3370
  • [2] A prosodic model for text-to-speech synthesis in French
    Di Cristo, A
    Di Cristo, P
    Campione, E
    Véronis, J
    INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
  • [3] A superposed prosodic model for Chinese text-to-speech synthesis
    Chen, GP
    Bailly, G
    Liu, QF
    Wang, RH
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 177 - 180
  • [4] A prosodic Turkish text-to-speech synthesizer
    Vural, E
    Oflazer, K
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 458 - 460
  • [5] Trainable prosodic model for standard Chinese Text-to-Speech system
    TAO Jianhua
    Chinese Journal of Acoustics, 2001, (03) : 257 - 265
  • [6] A prosodic phrasing model for a Korean text-to-speech synthesis system
    Yoon, K
    COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 69 - 79
  • [7] Research on prosodic features and their prediction issues in Uyghur Text-to-Speech System
    Hamdulla, Askar
    Rozi, Askar
    Eli, Gulnar
    Tursun, Dilmurat
    PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 257 - 260
  • [8] A Prosodic Text-to-Speech System for Yoruba Language
    Akinwonmi, Akintoba Emmanuel
    Alese, Boniface Kayode
    2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 630 - 635
  • [9] ON GRANULARITY OF PROSODIC REPRESENTATIONS IN EXPRESSIVE TEXT-TO-SPEECH
    Babianski, Mikolaj
    Pokora, Kamil
    Shah, Raahil
    Sienkiewicz, Rafal
    Korzekwa, Daniel
    Klimkov, Viacheslav
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 892 - 899
  • [10] Prosodic Annotation in a Thai Text-to-speech System
    Potisuk, Siripong
    PACLIC 21: THE 21ST PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2007, : 405 - 414