A quantitative method for modeling context in concatenative synthesis using large speech database

被引：0

作者：

Hamza, W

Rashwan, M

Afify, M

机构：

来源：

2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM | 2001年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Modeling phonetic context is one of the key points to get natural sounding in concatenative speech synthesis. In this paper, a new quantitative method to model context has been proposed. In the proposed method, the context is measured as the distance between leafs of the top-down likelihood-based decision trees that have been grown during the construction of acoustic inventory. Unlike other context modeling methods, this method allows the unit selection algorithm to borrow unit occurrences from other contexts when their context distances are close. This is done by incorporating the measured distance as an element in the unit selection cost function. The motivation behind this method is that it reduces the required speech modification by using better unit occurrences from near context. This method also makes it easy to use long synthesis units, e.g. syllables or words, in the same unit selection framework.

引用

页码：789 / 792

页数：4

共 50 条

[31] Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Liu, Rui
Hu, Yifan
Ren, Yi
Yin, Xiang
Li, Haizhou
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18698 - 18706
[32] Method of Modeling and Harmonic Synthesis of Phonemes of Human Speech with Emotional Coloring
G. Lan
A. S. Fadeev
Automatic Documentation and Mathematical Linguistics, 2023, 57 : 219 - 227
[33] Method of Modeling and Harmonic Synthesis of Phonemes of Human Speech with Emotional Coloring
Lan, G.
Fadeev, A. S.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2023, 57 (04) : 219 - 227
[34] HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1860 - +
[35] Soft context clustering for F0 modeling in HMM-based speech synthesis
Khorram, Soheil
Sameti, Hossein
King, Simon
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[36] MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis
Lei S.
Zhou Y.
Chen L.
Wu Z.
Wu X.
Kang S.
Meng H.
IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3290 - 3303
[37] Soft context clustering for F0 modeling in HMM-based speech synthesis
Soheil Khorram
Hossein Sameti
Simon King
EURASIP Journal on Advances in Signal Processing, 2015
[38] AFFECTIVE STRUCTURE MODELING OF SPEECH USING PROBABILISTIC CONTEXT FREE GRAMMAR FOR EMOTION RECOGNITION
Huang, Kun-Yi
Lin, Jia-Kuan
Chiu, Yu-Hsien
Wu, Chung-Hsien
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5286 - 5290
[39] An Overview of Large Range Modeling Using Decomposition Method
Liu, Jinchen
Zhang, Zhiguo
Liu, Zuodong
Zhang, Wei
2022 International Conference on Microwave and Millimeter Wave Technology, ICMMT 2022 - Proceedings, 2022,
[40] An Overview of Large Range Modeling Using Decomposition Method
Liu, Jinchen
Zhang, Zhiguo
Liu, Zuodong
Zhang, Wei
2022 INTERNATIONAL CONFERENCE ON MICROWAVE AND MILLIMETER WAVE TECHNOLOGY (ICMMT), 2022,

← 1 2 3 4 5 →