Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion

被引：0

作者：

Nose, Takashi ^{[1
]}

Kobayashi, Takao ^{[2
]}

机构：

[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan

[2] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 227, Japan

来源：

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年

关键词：

HMM-based speech synthesis; quantized F0 context; low bit-rate speech coding; voice conversion; HMM;

D O I：

10.1109/IIH-MSP.2014.149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a technique for language-independent prosody modeling using unsupervised prosodic labeling in HMM-based speech synthesis and shows its applications to low bit-rate speech coding and speaker-independent voice conversion. In the proposed technique, sequences of prosodic features are roughly quantized at a phone level and the resultant indexes are used as the prosodic context for the model training. The conventional HMM-based speech synthesis requires accurate prosodic labels corresponding to the speech samples where manual modification is necessary to improve the modeling accuracy, which sometimes takes extra costs and limits its application. In contrast, the proposed technique creates the prosodic label from the training data itself and can apply not only to the speech synthesis but also to the speech coding and voice conversion. Subjective experimental results show the effectiveness of the use of the quantized F0 context without manual prosodic labeling.

引用

页码：578 / 581

页数：4

共 50 条

[41] VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0
Yutani, Kaori
Uto, Yosuke
Nankaku, Yoshihiko
Lee, Akinobu
Tokuda, Keiichi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3897 - 3900
[42] A Method for Automatically Estimating F0 Model Parameters and A Speech Re-Synthesis Tool Using F0 Model and STRAIGHT
Sato, Shota
Kimura, Taro
Horiuchi, Yasuo
Nishida, Masafumi
Kuroiwa, Shingo
Ichikawa, Akira
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 545 - +
[43] A STYLE CAPTURING APPROACH TO F0 TRANSFORMATION IN VOICE CONVERSION
Anumanchipalli, Gopala Krishna
Oliveira, Luis C.
Black, Alan W.
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6915 - 6919
[44] CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion
Patel, Maitreya
Purohit, Mirali
Shah, Jui
Patil, Havant A.
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 411 - 415
[45] Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMM-based Speech Synthesis
Ogbureke, Kalu U.
Cabral, Joao P.
Carson-Berndsen, Julie
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 67 - 70
[46] A Novel Model of F0 Contours Prediction for Continuous Speech
胡文英
祖漪清
王志中
JournalofShanghaiJiaotongUniversity, 2005, (03) : 231 - 235
[47] Maximising objective speech intelligibility by local f0 modulation
Villegas, Julian
Cooke, Martin
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1702 - 1705
[48] F0 slope and mean: cues to speech segmentation in French
Cordero, Maria del Mar
Meunier, Fanny
Grimault, Nicolas
Pota, Stephane
Spinelli, Elsa
INTERSPEECH 2020, 2020, : 1610 - 1614
[49] TRANSFORMATION OF F0 CONTOURS FOR LEXICAL TONES IN CONCATENATIVE SPEECH SYNTHESIS OF TONAL LANGUAGES
Trung-Nghia Phung
Luong, Mai Chi
Akagi, Masato
2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 129 - 134
[50] JOINT MODELLING OF VOICING LABEL AND CONTINUOUS F0 FOR HMM BASED SPEECH SYNTHESIS
Yu, K.
Young, S.
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4572 - 4575

← 1 2 3 4 5 →