Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion

被引：0

作者：

Nose, Takashi ^{[1
]}

Kobayashi, Takao ^{[2
]}

机构：

[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan

[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 227, Japan

来源：

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年

关键词：

HMM-based speech synthesis; quantized F0 context; low bit-rate speech coding; voice conversion; HMM;

D O I：

10.1109/IIH-MSP.2014.149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labeling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labeling.

引用

页码：578 / 581

页数：4

共 50 条

[21] Review of F0 modelling and generation in HMM based speech synthesis
Yu, Kai
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 599 - 604
[22] Context-dependent additive log F0 model for HMM-based speech synthesis
Zen, Heiga
Braunschweiler, Norbert
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2039 - 2042
[23] The use of a generative model of F0 contours for multilingual speech synthesis
Fujisaki, H
Ohno, S
ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 714 - 717
[24] VERY LOW BIT-RATE F0 CODING FOR PHONETIC VOCODER USING MSD-HMM WITH QUANTIZED F0 CONTEXT
Nose, Takashi
Kobayashi, Takao
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5236 - 5239
[25] IMPROVED F0 MODELING AND GENERATION IN VOICE CONVERSION
Kunikoshi, Aki
Qian, Yao
Soong, Frank
Minematsu, Nobuaki
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4568 - 4571
[26] F0 Transformation within the Voice Conversion Framework
Hanzlicek, Zdenek
Matousek, Jindrich
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 681 - 684
[27] Voice Conversion for Whispered Speech Synthesis
Cotescu, Marius
Drugman, Thomas
Huybrechts, Goeric
Lorenzo-Trueba, Jaime
Moinet, Alexis
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
[28] Multiband statistical learning for F0 estimation in speech
Sha, F
Burgoyne, JA
Saul, LK
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 661 - 664
[29] HMM-BASED EXPRESSIVE SPEECH SYNTHESIS BASED ON PHRASE-LEVEL F0 CONTEXT LABELING
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Koriyama, Tomoki
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7859 - 7863
[30] DECLINATION OF FUNDAMENTAL FREQUENCY (F0) IN SPEECH PRODUCTION
COOPER, WE
SORENSEN, JM
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S67 - S67

← 1 2 3 4 5 →