FINE-GRAINED EMOTION STRENGTH TRANSFER, CONTROL AND PREDICTION FOR EMOTIONAL SPEECH SYNTHESIS

被引:32
|
作者
Lei, Yi [1 ]
Yang, Shan [1 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
来源
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年
关键词
text-to-speech; expressive speech synthesis; emotion strength; sequence-to-sequence;
D O I
10.1109/SLT48900.2021.9383524
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a unified model to conduct emotion transfer, control and prediction for sequence-to-sequence based fine-grained emotional speech synthesis. Conventional emotional speech synthesis often needs manual labels or reference audio to determine the emotional expressions of synthesized speech. Such coarse labels cannot control the details of speech emotion, often resulting in an averaged emotion expression delivery, and it is also hard to choose suitable reference audio during inference. To conduct fine-grained emotion expression generation, we introduce phoneme-level emotion strength representations through a learned ranking function to describe the local emotion details, and the sentence-level emotion category is adopted to render the global emotions of synthesized speech. With the global render and local descriptors of emotions, we can obtain fine-grained emotion expressions from reference audio via its emotion descriptors (for transfer) or directly from phoneme-level manual labels (for control). As for the emotional speech synthesis with arbitrary text inputs, the proposed model can also predict phoneme-level emotion expressions from texts, which does not require any reference audio or manual label.
引用
收藏
页码:423 / 430
页数:8
相关论文
共 50 条
  • [41] Porosity prediction for fine-grained filter cake
    Dück, J
    Zvetanov, E
    Neesse, T
    CHEMICAL ENGINEERING & TECHNOLOGY, 2000, 23 (01) : 18 - 22
  • [42] Prediction of compaction characteristics of fine-grained soils
    Gurtug, Y
    Sridharan, A
    GEOTECHNIQUE, 2002, 52 (10): : 761 - 763
  • [43] Porosity prediction for fine-grained filter cake
    Dept. Environ. Eng. and Recycling, University of Erlangen-Nuremberg, Schottkystr. 10, D-91058 Erlangen, Germany
    Chemical Engineering and Technology, 2000, 23 (01): : 18 - 22
  • [44] Towards Fine-grained Text Sentiment Transfer
    Luo, Fuli
    Li, Peng
    Yang, Pengcheng
    Zhou, Jie
    Tan, Yutong
    Chang, Baobao
    Sui, Zhifang
    Sun, Xu
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2013 - 2022
  • [45] Nonparametric Part Transfer for Fine-grained Recognition
    Goering, Christoph
    Rodner, Erik
    Freytag, Alexander
    Denzler, Joachim
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2489 - 2496
  • [46] Transfer learning for fine-grained entity typing
    Hou, Feng
    Wang, Ruili
    Zhou, Yi
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (04) : 845 - 866
  • [47] Transfer learning for fine-grained entity typing
    Feng Hou
    Ruili Wang
    Yi Zhou
    Knowledge and Information Systems, 2021, 63 : 845 - 866
  • [48] Fine-grained Emotion Role Detection Based on Retweet Information
    Yu, Zhiwen
    Yi, Fei
    Ma, Chao
    Wang, Zhu
    Guo, Bin
    Chen, Liming
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2019, 19 (01)
  • [49] Fine-Grained Emotion Detection in Contact Center Chat Utterances
    Mundra, Shreshtha
    Sen, Anirban
    Sinha, Manjira
    Mannarswamy, Sandya
    Dandapat, Sandipan
    Roy, Shourya
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 337 - 349
  • [50] Fine-Grained Sentiment Analysis of Social Media with Emotion Sensing
    Wang, Zhaoxia
    Chong, Chee Seng
    Lan, Landy
    Yang, Yinping
    Ho, Seng Beng
    Tong, Joo Chuan
    PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 1361 - 1364