FINE-GRAINED EMOTION STRENGTH TRANSFER, CONTROL AND PREDICTION FOR EMOTIONAL SPEECH SYNTHESIS

被引:32
|
作者
Lei, Yi [1 ]
Yang, Shan [1 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
关键词
text-to-speech; expressive speech synthesis; emotion strength; sequence-to-sequence;
D O I
10.1109/SLT48900.2021.9383524
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a unified model to conduct emotion transfer, control and prediction for sequence-to-sequence based fine-grained emotional speech synthesis. Conventional emotional speech synthesis often needs manual labels or reference audio to determine the emotional expressions of synthesized speech. Such coarse labels cannot control the details of speech emotion, often resulting in an averaged emotion expression delivery, and it is also hard to choose suitable reference audio during inference. To conduct fine-grained emotion expression generation, we introduce phoneme-level emotion strength representations through a learned ranking function to describe the local emotion details, and the sentence-level emotion category is adopted to render the global emotions of synthesized speech. With the global render and local descriptors of emotions, we can obtain fine-grained emotion expressions from reference audio via its emotion descriptors (for transfer) or directly from phoneme-level manual labels (for control). As for the emotional speech synthesis with arbitrary text inputs, the proposed model can also predict phoneme-level emotion expressions from texts, which does not require any reference audio or manual label.
引用
收藏
页码:423 / 430
页数:8
相关论文
共 50 条
  • [31] Accounting for the microstructure for the prediction of unsaturated shear strength of remolded fine-grained soils
    Mpawenayo, Regis
    Gerard, Pierre
    CANADIAN GEOTECHNICAL JOURNAL, 2023,
  • [32] Fine-Grained Emotion Comprehension: Semisupervised Multimodal Emotion and Intensity Recognition
    Fang, Zheng
    Liu, Zhen
    Liu, Tingting
    Hung, Chih-Chieh
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [33] A Fine-Grained Emotion Analysis Method for Chinese Microblog
    Zhou, Rui
    Zhang, Hu-yin
    Ye, Gang
    DATA SCIENCE, PT 1, 2017, 727 : 1 - 11
  • [34] Strength and deformation of tailings with fine-grained interlayers
    Chen, Qinglin
    Zhang, Chao
    Yang, Chunhe
    Ma, Changkun
    Pan, Zhenkai
    Daemen, J. J. K.
    ENGINEERING GEOLOGY, 2019, 256 : 110 - 120
  • [35] FULLY-HIERARCHICAL FINE-GRAINED PROSODY MODELING FOR INTERPRETABLE SPEECH SYNTHESIS
    Sun, Guangzhi
    Zhang, Yu
    Weiss, Ron J.
    Cao, Yuan
    Zen, Heiga
    Wu, Yonghui
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6264 - 6268
  • [36] Fine-grained prosody modeling in neural speech synthesis using ToBI representation
    Zou, Yuxiang
    Liu, Shichao
    Yin, Xiang
    Lin, Haopeng
    Wang, Chunfeng
    Zhang, Haoyu
    Ma, Zejun
    INTERSPEECH 2021, 2021, : 3146 - 3150
  • [37] Fine-Grained Access Control for Microservices
    Nehme, Antonio
    Jesus, Vitor
    Mahbub, Khaled
    Abdallah, Ali
    FOUNDATIONS AND PRACTICE OF SECURITY, FPS 2018, 2019, 11358 : 285 - 300
  • [38] Extracting method for fine-grained emotional features in videos
    Zheng, Cangzhi
    Peng, Junjie
    Cai, Zesu
    KNOWLEDGE-BASED SYSTEMS, 2024, 302
  • [39] Fine-Grained Crime Prediction in an Urban Neighborhood
    Kent, Christopher
    Venugopal, Deepak
    2018 IEEE INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2018,
  • [40] Hierarchical CVAE for Fine-Grained Hate Speech Classification
    Qian, Jing
    ElSherief, Mai
    Belding, Elizabeth
    Wang, William Yang
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3550 - 3559