EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH

被引:0
|
作者
Kim, Tae-Ho [1 ]
Cho, Sungjae [2 ]
Choi, Shinkook [2 ]
Park, Sejik [1 ]
Lee, Soo-Young [1 ]
机构
[1] Korea Adv Inst Sci & Technol, KI Artificial Intelligence, Daejeon, South Korea
[2] Korea Adv Inst Sci & Technol, Informat & Elect Res Inst, Daejeon, South Korea
关键词
voice conversion; text-to-speech; emotional voice conversion; multitask learning; NETWORKS;
D O I
10.1109/icassp40776.2020.9053255
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice conversion (VC) is a task that alters the voice of a person to suit different styles while conserving the linguistic content. Previous state-of-the-art technology used in VC was based on the sequence-to-sequence (seq2seq) model, which could lose linguistic information. There was an attempt to overcome this problem using textual supervision; however, this required explicit alignment, and therefore the benefit of using seq2seq model was lost. In this study, a voice converter that utilizes multitask learning with text-to-speech (TTS) is presented. By using multitask learning, VC is expected to capture linguistic information and preserve the training stability. This method does not require explicit alignment for capturing abundant text information. Experiments on VC were performed on a male-Korean-emotional-text-speech dataset to convert the neutral voice to emotional voice. It was shown that multitask learning helps to preserve the linguistic content.
引用
收藏
页码:7774 / 7778
页数:5
相关论文
共 50 条
  • [1] Spectral voice conversion for text-to-speech synthesis
    Kain, A
    Macon, MW
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 285 - 288
  • [2] Development of robotic voice conversion for RIBO using text-to-speech synthesis
    Hossain, Md. Jakir
    Al Amin, Sayed Mahmud
    Islam, Md. Saiful
    Marium-E-Jannat
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT), 2018, : 422 - 425
  • [3] MULTILINGUAL TEXT-TO-SPEECH TRAINING USING CROSS LANGUAGE VOICE CONVERSION AND SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS
    Wu, Jilong
    Polyak, Adam
    Taigman, Yaniv
    Fong, Jason
    Agrawal, Prabhav
    He, Qing
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8017 - 8021
  • [4] Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining
    Huang, Wen-Chin
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kameoka, Hirokazu
    Toda, Tomoki
    [J]. INTERSPEECH 2020, 2020, : 4676 - 4680
  • [5] A TEXT-TO-SPEECH CONVERSION SYSTEM
    KLATT, DH
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1982, 184 (SEP): : 11 - CINF
  • [6] TEXT-TO-SPEECH CONVERSION TECHNOLOGY
    OMALLEY, MH
    [J]. COMPUTER, 1990, 23 (08) : 17 - 23
  • [7] Text-to-Speech Software and Learning: Investigating the Relevancy of the Voice Effect
    Craig, Scotty D.
    Schroeder, Noah L.
    [J]. JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 2019, 57 (06) : 1534 - 1548
  • [8] Text aware Emotional Text-to-speech with BERT
    Mukherjee, Arijit
    Bansal, Shubham
    Satpal, Sandeepkumar
    Mehta, Rupesh
    [J]. INTERSPEECH 2022, 2022, : 4601 - 4605
  • [9] Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
    Wang, Shijun
    Gudnason, Jon
    Borth, Damian
    [J]. INTERSPEECH 2023, 2023, : 351 - 355
  • [10] Customization of IBM Intu's Voice by Connecting Text-to-Speech Services with a Voice Conversion Network
    Song, Jongyoon
    Lee, Jaekoo
    Kim, Hyunjae
    Choi, Euishin
    Kim, Minseok
    Yoon, Sungroh
    [J]. PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 830 - 839