Prosody Phrase Break Prediction in Vietnamese Using Decision Tree

被引:0
|
作者
Kui, Liping [1 ]
Yang, Jian [1 ]
Cheng, Yang [1 ]
He, Bin [1 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650091, Peoples R China
关键词
Vietnamese; speech synthesis; prosody phrase; decision tree; C4.5;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The intelligibility of synthesized speech is satisfactory, but the naturalness is fair in Vietnamese speech synthesis system without prosody phrase breaks. In order to improve the naturalness of synthesized speech, prosody phrase (L3) breaks are automatically predicted by using C4.5 decision tree algorithm in this paper. Firstly, we collect Vietnamese text and construct corpus. Then we obtain training date and testing data after word segmentation, part of speech (POS) tags and manual label of L3 breaks for the sentences in the corpus. Word segmentation and part of speech (POS) tags are conducted by applying text analysis software. Secondly, we extract the relevant attribute from the training data, and then obtain decision tree by using C4.5 decision tree algorithm. According to the pruned decision tree, L3 breaks are predicted in prosody labeling stage. Finally, we conduct objective and subjective test to the prediction. The results of evaluation show that an F-Score of 59.96% and acceptable rate of 70.6% can be achieved for the L3 prediction in closed set, and there is an F-Score of 58.37% and acceptable rate of 68.9% in open set. This experiment for the further improvement of naturalness of synthesized Vietnamese speech lays a foundation.
引用
收藏
页码:3733 / 3736
页数:4
相关论文
共 6 条
  • [1] CHOU FC, 1998, P 5 INT C SPOK LANG, P1263
  • [2] Duda R. O., Pattern classification
  • [3] Tran D. D., 2006, INT S TON ASP LANG, P143
  • [4] Trong Do T., 2004, ACOUSTICAL SCI TECHN, V25, P347
  • [5] Vu TT, 2009, INT CONF SPEECH DATA, P116, DOI 10.1109/ICSDA.2009.5278366
  • [6] Automatic Labeling of Prosodic Patterns
    Wightman, Colin W.
    Ostendorf, Mari
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04): : 469 - 481