Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing

被引:66
|
作者
Tachibana, M [1 ]
Yamagishi, J [1 ]
Masuko, T [1 ]
Kobayashi, T [1 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan
来源
关键词
HMM-based speech synthesis; speaking style; emotional expression; style interpolation; style morphing; hidden semi-Markov model (HSMM);
D O I
10.1093/ietisy/e88-d.11.2484
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes an approach to generating speech with emotional expressivity and speaking style variability. The approach is based on a speaking style and emotional expression modeling technique for HMM-based speech synthesis. We first model several representative styles, each of which is a speaking style and/or an emotional expression, in an HMM-based speech synthesis framework. Then, to generate synthetic speech with an intermediate style from representative ones, we synthesize speech from a model obtained by interpolating representative style models using a model interpolation technique. We assess the style interpolation technique with subjective evaluation tests using four representative styles, i.e., neutral, joyful, sad, and rough in read speech and synthesized speech from models obtained by interpolating models for all combinations of two styles. The results show that speech synthesized from the interpolated model has a style in between the two representative ones. Moreover, we can control the degree of expressivity for speaking styles or emotions in synthesized speech by changing the interpolation ratio in interpolation between neutral and other representative styles. We also show that we can achieve style morphing in speech synthesis, namely, changing style smoothly from one representative style to another by gradually changing the interpolation ratio.
引用
收藏
页码:2484 / 2491
页数:8
相关论文
共 29 条
  • [1] Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis
    Yamagishi, J
    Onishi, K
    Masuko, T
    Kobayashi, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 502 - 509
  • [2] A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM
    Nose, Takashi
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (01): : 116 - 124
  • [3] Synthesizing various speaking styles in a text-to-speech system
    Abe, Masanobu
    NTT R and D, 1996, 45 (10): : 1019 - 1025
  • [4] Detecting Depression in Speech Under Different Speaking Styles and Emotional Valences
    Liu, Zhenyu
    Hu, Bin
    Li, Xiaoyu
    Liu, Fei
    Wang, Gang
    Yang, Jing
    BRAIN INFORMATICS, BI 2017, 2017, 10654 : 261 - 271
  • [5] THE USE OF SPEECH SYNTHESIS IN EXPLORING DIFFERENT SPEAKING STYLES
    GRANSTROM, B
    SPEECH COMMUNICATION, 1992, 11 (4-5) : 347 - 355
  • [6] Neural Speech Synthesis with Style Intensity Interpolation: A Perceptual Analysis
    Tits, Noe
    El Haddad, Kevin
    Dutoit, Thierry
    HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 485 - 487
  • [7] Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech
    Cohn, Michelle
    Pycha, Anne
    Zellou, Georgia
    COGNITION, 2021, 210
  • [8] Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
    Paul, Dipjyoti
    Shifas, Muhammed P., V
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2020, 2020, : 1361 - 1365
  • [9] Speaker-dependent model interpolation for statistical emotional speech synthesis
    Chih-Yu Hsu
    Chia-Ping Chen
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [10] Speaker-dependent model interpolation for statistical emotional speech synthesis
    Hsu, Chih-Yu
    Chen, Chia-Ping
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012, : 1 - 10