Improving Mandarin Prosody Generation Using Alternative Smoothing Techniques

被引：2

作者：

Huang, Yi-Chin ^{[1
]}

Wu, Chung-Hsien ^{[1
]}

Weng, Si-Ting ^{[2
]}

机构：

[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan

[2] Natl Cheng Kung Univ, Dept Med Informat, Tainan 701, Taiwan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2016年 / 24卷 / 11期

关键词：

Fujisaki model; hierarchical prosodic pattern; Mandarin prosodic structure; natural speech synthesis; SPEECH SYNTHESIS; HMM; ALGORITHMS; CONTOURS;

D O I：

10.1109/TASLP.2016.2588727

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Prosody plays a vital role for conveying both communicative meanings and specific speaking styles in speech communication. In recent years, Hidden Markov Model (HMM)-based synthesis system (HTS) has been developed in triumph, which can synthesize stable and smooth speech. However, the prosody of the synthesized speech suffers from the over-smoothing problem. Thus, a better prosodic model is required to improve the natural variability of the synthesized speech. This study exploits a hybrid method to alleviate this problem by combining the statistical and the template-based unit selection methods. First, a two-level clustering approach is proposed to obtain representative prosodic patterns (denoted by codewords) of the hierarchical prosodic structure modeled by a modified Fujisaki model. The prosodic codewords are then used to represent the prosody of each sentence in the parallel corpus consisting of the real speech corpus and the synthesized counterpart obtained from the HTS. The synthesized speech utterance is then used as the query for retrieving the prosodic codewords of the utterances in the synthesized corpus. The retrieved synthesized prosodic codewords are mapped to the prosodic codewords of the real speech based on linear mapping rules obtained from the parallel corpus. The prosodic codeword language models for prosodic word and prosodic phrase are employed respectively to choose the optimal codeword sequence of the real speech. Finally, the most likely sequence of prosodic codewords can be obtained based on the NURBS-based continuity measure for synthesizing speech with natural prosody. The experimental results of subjective and objective tests demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of the HMM-based method.

引用

页码：1897 / 1907

页数：11

共 50 条

[1] An Investigation on Linguistic Features for Mandarin Prosody Generation
Hung, Yu-Ping
Yeh, Han-Yun
Liao, I-Bin
Pan, Chen-Ming
Chiang, Chen-Yu
[J]. 2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
[2] Improvements on Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation
Chiang, Chen-Yu
Hung, Yu-Ping
Liou, Guan-Ting
Wang, Yih-Ru
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[3] Improving Mandarin Prosody Boundary Detection by Using Phonetic Information and Deep LSTM Model
Lin, Ju
Ji, Zhuanzhuan
Dong, Wenwei
Xie, Yanlu
Zhang, Jinsong
[J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 504 - 508
[4] Punctuation-generation-inspired linguistic features for Mandarin prosody generation
Chen-Yu Chiang
Yu-Ping Hung
Han-Yun Yeh
I-Bin Liao
Chen-Ming Pan
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2019
[5] Punctuation-generation-inspired linguistic features for Mandarin prosody generation
Chiang, Chen-Yu
Hung, Yu-Ping
Yeh, Han-Yun
Liao, I-Bin
Pan, Chen-Ming
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (1)
[6] Using prosody to improve Mandarin automatic speech recognition
Ni, Chong-Jia
Liu, Wen-Ju
Xu, Bo
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2698 - 2701
[7] Advanced Unsupervised Joint Prosody Labeling and Modeling for Mandarin Speech and Its Application to Prosody Generation for TTS
Chiang, Chen-Yu
Chen, Sin-Horng
Wang, Yih-Ru
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 500 - 503
[8] High-Quality Prosody Generation in Mandarin Text-to-Speech System
Guo, Qing
Zhang, Jie
Katae, Nobuyuki
Yu, Hao
[J]. FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2010, 46 (01): : 40 - 46
[9] High-quality prosody generation in Mandarin text-to-speech system
Guo, Qing
Zhang, Jie
Katae, Nobuyuki
Yu, Hao
[J]. Fujitsu Scientific and Technical Journal, 2010, 46 (01): : 40 - 46
[10] Alternative Smoothing Techniques to Mitigate EUV Substrate Defectivity
Teki, R.
Kadaksham, A. John
House, M.
Harris-Jones, J.
Ma, A.
Babu, S. V.
Hariprasad, A.
Dumas, P.
Jenkins, R.
Provine, J.
Richmann, A.
Stowers, J.
Meyers, S.
Dietze, U.
Kusumoto, T.
Yatsui, T.
Ohtsu, M.
Goodwin, F.
[J]. EXTREME ULTRAVIOLET (EUV) LITHOGRAPHY III, 2012, 8322

← 1 2 3 4 5 →