Robust Estimation of Multiple-Regression HMM Parameters for Dimension-Based Expressive Dialogue Speech Synthesis

被引：0

作者：

Nagata, Tomohiro ^{[1
]}

Mori, Hiroki ^{[1
]}

Nose, Takashi ^{[2
]}

机构：

[1] Utsunomiya Univ, Grad Sch Engn, Utsunomiya, Tochigi, Japan

[2] Tokyo Inst Technol, Grad Sch Sci & Engn, Tokyo, Japan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

HMM-based speech synthesis; spontaneous speech; paralinguistic information; UU Database; MRHSMM; MAP estimation; ADAPTATION; MODEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes spontaneous dialogue speech synthesis based on multiple-regression hidden semi-Markov model (MRHSMM), which enables users to specify paralinguistic information of synthesized speech with a dimensional representation. Paralinguistic aspects of synthesized speech are controlled by multiple regression models whose explanatory variables are abstract dimensions such as pleasant-unpleasant and aroused sleepy. For robust estimation of the regression matrices of the MRHSMM with unbalanced spontaneous dialogue speech samples, the re-estimation formulae were derived in the framework of the maximum a posteriori (MAP) estimation. The result of a perceptual experiment confirmed that the naturalness of synthesized speech was improved by applying the MAP estimation for regression matrices. In addition a high correlation (R similar or equal to 0.7) wasobserved between given and perceived paralinguistic information, which implies that the proposed method could successfully reflect intended paralinguistic messages on the synthesized speech.

引用

页码：1548 / 1552

页数：5

共 50 条

[1] Dimensional paralinguistic information control based on multiple-regression HSMM for spontaneous dialogue speech synthesis with robust parameter estimation
Nagata, Tomohiro
Mori, Hiroki
Nose, Takashi
[J]. SPEECH COMMUNICATION, 2017, 88 : 137 - 148
[2] EMOTIONAL SPEECH RECOGNITION BASED ON STYLE ESTIMATION AND ADAPTATION WITH MULTIPLE-REGRESSION HMM
Ijima, Yusuke
Tachibana, Makoto
Nose, Takashi
Kobayashi, Takao
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4157 - 4160
[3] A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM
Ijima, Yusuke
Nose, Takashi
Tachibana, Makoto
Kobayashi, Takao
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (01): : 107 - 115
[4] An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model
Nose, Takashi
Kobayashi, Takao
[J]. SPEECH COMMUNICATION, 2013, 55 (02) : 347 - 357
[5] An On-line Adaptation Technique for Emotional Speech Recognition Using Style Estimation with Multiple-Regression HMM
Ijima, Yusuke
Tachibana, Makoto
Nose, Takashi
Kobayashi, Takao
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1297 - 1300
[6] Speaking Style Adaptation for Spontaneous Speech Recognition Using Multiple-Regression HMM
Ijima, Yusuke
Matsubara, Takeshi
Nose, Takashi
Kobayashi, Takao
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 548 - 551
[7] Robust Voicing Detection and Estimation for HMM-Based Speech Synthesis
Narendra, N. P.
Rao, K. Sreenivasa
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2015, 34 (08) : 2597 - 2619
[8] A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM
Nose, Takashi
Kobayashi, Takao
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 116 - 119
[9] DIALOGUE CONTEXT SENSITIVE HMM-BASED SPEECH SYNTHESIS
Tsiakoulis, Pirros
Breslin, Catherine
Gasic, Milica
Henderson, Matthew
Kim, Dongho
Szummer, Martin
Thomson, Blaise
Young, Steve
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[10] FACTORED MLLR ADAPTATION FOR HMM-BASED EXPRESSIVE SPEECH SYNTHESIS
Sung, June Sig
Hong, Doo Hwa
Lee, Chul Min
Kim, Nam Soo
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 974 - 977

← 1 2 3 4 5 →