Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion

被引：0

作者：

Nose, Takashi ^{[1
]}

Kobayashi, Takao ^{[2
]}

机构：

[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan

[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 227, Japan

来源：

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年

关键词：

HMM-based speech synthesis; quantized F0 context; low bit-rate speech coding; voice conversion; HMM;

D O I：

10.1109/IIH-MSP.2014.149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labeling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labeling.

引用

页码：578 / 581

页数：4

共 50 条

[31] An exploration of the accentuation effect: errors in memory for voice fundamental frequency (F0) and speech rate
Gous, Georgina
Dunn, Andrew K.
Baguley, Thom
Stacey, Paula C.
LANGUAGE COGNITION AND NEUROSCIENCE, 2018, 33 (01) : 98 - 110
[32] Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis
Wang, Cheng-Cheng
Ling, Zhen-Hua
Dai, Li-Rong
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 412 - 415
[33] F0 prediction model of speech synthesis based on template and statistical method
Tao, JH
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 497 - 504
[34] Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis
Yu, Kai
Young, Steve
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1071 - 1079
[35] PROBABLISTIC MODELLING OF F0 IN UNVOICED REGIONS IN HMM BASED SPEECH SYNTHESIS
Yu, K.
Toda, T.
Gasic, M.
Keizer, S.
Mairesse, F.
Thomson, B.
Young, S.
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3773 - +
[36] A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis
Lei, Ming
Wu, Yi-Jian
Soong, Frank K.
Ling, Zhen-Hua
Dai, Li-Rong
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2170 - +
[37] AN F0 MODELING TECHNIQUE BASED ON PROSODIC EVENTS FOR SPONTANEOUS SPEECH SYNTHESIS
Koriyama, Tomoki
Nose, Takashi
Kobayashi, Takao
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4589 - 4592
[38] A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
Wang, Xin
Takaki, Shinji
Yamagishi, Junichi
King, Simon
Tokuda, Keiichi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (157-170) : 157 - 170
[39] An RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis
Wang, Xin
Takaki, Shinji
Yamagishi, Junichi
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1059 - 1063
[40] F0 generation in a text-to-speech system using a database of natural F0 patterns
da Silva, CH
Nagle, EJ
Runstein, F
Violaro, F
ITS '98 PROCEEDINGS - SBT/IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 1998, : 213 - 218

← 1 2 3 4 5 →