Controllable Context-aware Conversational Speech Synthesis

被引:3
|
作者
Cong, Jian [1 ,2 ]
Yang, Shan [2 ]
Hu, Na [2 ]
Li, Guangzhi [2 ]
Xie, Lei [1 ]
Su, Dan [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
来源
关键词
Speech synthesis; Spontaneous speech; Conversational speech;
D O I
10.21437/Interspeech.2021-412
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In spoken conversations, spontaneous behaviors like filled pause and prolongations always happen. Conversational partner tends to align features of their speech with their interlocutor which is known as entrainment. To produce human-like conversations, we propose a unified controllable spontaneous conversational speech synthesis framework to model the above two phenomena. Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text. We subsequently develop an algorithm based on the predictor to control the occurrence frequency of the behaviors, making the synthesized speech vary from less disfluent to more disfluent. To model the speech entrainment at acoustic level, we utilize a context acoustic encoder to extract a global style embedding from the previous speech conditioning on the synthesizing of current speech. Furthermore, since the current and previous utterances belong to the different speakers in a conversation, we add a domain adversarial training module to eliminate the speaker-related information in the acoustic encoder while maintaining the style-related information. Experiments show that our proposed approach can synthesize realistic conversations and control the occurrences of the spontaneous behaviors naturally.
引用
收藏
页码:4658 / 4662
页数:5
相关论文
共 50 条
  • [1] Context-aware RNNLM Rescoring for Conversational Speech Recognition
    Wei, Kun
    Guo, Pengcheng
    Lv, Hang
    Tu, Zhen
    Xie, Lei
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [2] CCVS: Context-aware Controllable Video Synthesis
    Le Moing, Guillaume
    Ponce, Jean
    Schmid, Cordelia
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Context-Aware Conversational Developer Assistants
    Bradley, Nick C.
    Fritz, Thomas
    Holmes, Reid
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 993 - 1003
  • [4] An Architecture for the Design of Context-Aware Conversational Agents
    Griol, David
    Sanchez-Pi, Nayat
    Carbo, Javier
    Molina, Jose M.
    [J]. ADVANCES IN PRACTICAL APPLICATIONS OF AGENTS AND MULTIAGENT SYSTEMS, 2010, 70 : 41 - 46
  • [5] A Context-Aware Conversational Agent in the Rehabilitation Domain
    Mavropoulos, Thanassis
    Meditskos, Georgios
    Symeonidis, Spyridon
    Kamateri, Eleni
    Rousi, Maria
    Tzimikas, Dimitris
    Papageorgiou, Lefteris
    Eleftheriadis, Christos
    Adamopoulos, George
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    [J]. FUTURE INTERNET, 2019, 11 (11):
  • [6] Mobile Conversational Agents for Context-Aware Care Applications
    Griol, David
    Callejas, Zoraida
    [J]. COGNITIVE COMPUTATION, 2016, 8 (02) : 336 - 356
  • [7] Towards Contrastive Context-Aware Conversational Emotion Recognition
    Zhang, Hanqing
    Song, Dawei
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1879 - 1891
  • [8] Mobile Conversational Agents for Context-Aware Care Applications
    David Griol
    Zoraida Callejas
    [J]. Cognitive Computation, 2016, 8 : 336 - 356
  • [9] Articulatory Speech Synthesis from Static Context-Aware Articulatory Targets
    Tsukanova, Anastasiia
    Elie, Benjamin
    Laprie, Yves
    [J]. STUDIES ON SPEECH PRODUCTION, 2018, 10733 : 37 - 47
  • [10] MusicRoBot: Towards Conversational Context-Aware Music Recommender System
    Zhou, Chunyi
    Jin, Yuanyuan
    Zhang, Kai
    Yuan, Jiahao
    Li, Shengyuan
    Wang, Xiaoling
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2018), PT II, 2018, 10828 : 817 - 820