SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION

被引:0
|
作者
Robinson, Carl [1 ]
Obin, Nicolas [1 ]
Roebel, Axel [1 ]
机构
[1] Sorbonne Univ, CNRS, IRCAM, Paris, France
关键词
speech emotion conversion; intonation; sequence-to-sequence models;
D O I
10.1109/icassp.2019.8683865
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice interfaces are becoming wildly popular and driving demand for more advanced speech synthesis and voice transformation systems. Current text-to-speech methods produce realistic sounding voices, but they lack the emotional expressivity that listeners expect, given the context of the interaction and the phrase being spoken. Emotional voice conversion is a research domain concerned with generating expressive speech from neutral synthesised speech or natural human voice. This research investigated the effectiveness of using a sequence-to-sequence (seq2seq) encoder-decoder based model to transform the intonation of a human voice from neutral to expressive speech, with some preliminary introduction of linguistic conditioning. A subjective experiment conducted on the task of speech emotion recognition by listeners successfully demonstrated the effectiveness of the proposed sequence-to-sequence models to produce convincing voice emotion transformations. In particular, conditioning the model on the position of the syllable in the phrase significantly improved recognition rates.
引用
收藏
页码:6830 / 6834
页数:5
相关论文
共 50 条
  • [1] Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network
    Chen, Xiaomin
    Han, Wenjing
    Ruan, Huabin
    Liu, Jiamu
    Li, Haifeng
    Jiang, Dongmei
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [2] MANDARIN ELECTROLARYNGEAL SPEECH VOICE CONVERSION WITH SEQUENCE-TO-SEQUENCE MODELING
    Yen, Ming-Chi
    Huang, Wen-Chin
    Kobayashi, Kazuhiro
    Peng, Yu-Huai
    Tsai, Shu-Wei
    Tsao, Yu
    Toda, Tomoki
    Jang, Jyh-Shing Roger
    Wang, Hsin-Min
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 650 - 657
  • [3] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition
    Ueno, Sei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) : 333 - 343
  • [4] Emotion Conversion using F0 Segment Selection
    Inanoglu, Zeynep
    Young, Steve
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2122 - 2125
  • [5] MULTIMODAL GROUNDING FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
    Caglayan, Ozan
    Sanabria, Ramon
    Palaskar, Shruti
    Barrault, Loic
    Metze, Florian
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8648 - 8652
  • [6] Sequence-to-Sequence Models for Emphasis Speech Translation
    Quoc Truong Do
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1873 - 1883
  • [7] Whisper to Normal Speech Conversion Using Sequence-to-Sequence Mapping Model With Auditory Attention
    Lian, Hailun
    Hu, Yuting
    Yu, Weiwei
    Zhou, Jian
    Zheng, Wenming
    [J]. IEEE ACCESS, 2019, 7 : 130495 - 130504
  • [8] Advancing sequence-to-sequence based speech recognition
    Tuske, Zoltan
    Audhkhasi, Kartik
    Saon, George
    [J]. INTERSPEECH 2019, 2019, : 3780 - 3784
  • [9] A Comparison of Sequence-to-Sequence Models for Speech Recognition
    Prabhavalkar, Rohit
    Rao, Kanishka
    Sainath, Tara N.
    Li, Bo
    Johnson, Leif
    Jaitly, Navdeep
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 939 - 943
  • [10] Direct speech-to-speech translation with a sequence-to-sequence model
    Jia, Ye
    Weiss, Ron J.
    Biadsy, Fadi
    Macherey, Wolfgang
    Johnson, Melvin
    Chen, Zhifeng
    Wu, Yonghui
    [J]. INTERSPEECH 2019, 2019, : 1123 - 1127