Speech synthesis of emotions using vowel features of a speaker

被引:3
|
作者
Boku, Kanu [1 ]
Asada, Taro [1 ]
Yoshitomi, Yasunari [1 ]
Tabuse, Masayoshi [1 ]
机构
[1] Kyoto Prefectural Univ, Grad Sch Life & Environm Sci, Sakyo Ku, 1-5 Nakaragi Cho, Shimogamo, Kyoto 6068522, Japan
关键词
Emotional speech; Feature parameter; Synthetic speech; Emotional synthetic speech; Vowel;
D O I
10.1007/s10015-013-0126-9
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Recently, methods for adding emotion to synthetic speech have received considerable attention in the field of speech synthesis research. We previously proposed a case-based method for generating emotional synthetic speech by exploiting the characteristics of the maximum amplitude and the utterance time of vowels, and the fundamental frequency of emotional speech. In the present study, we propose a method in which our reported method is further improved by controlling the fundamental frequency of emotional synthetic speech. As an initial investigation, we adopted the utterance of a Japanese name that is semantically neutral. By using the proposed method, emotional synthetic speech made from the emotional speech of one male subject was discriminable with a mean accuracy of 83.9 % when 18 subjects listened to the emotional synthetic utterances of "angry,'' "happy,'' "neutral,'' "sad,'' or "surprised'' when the utterance was the Japanese name "Taro,'' or "Hiroko.'' Further adjustment of fundamental frequency in the proposed method made a much clearer impression on the subjects for emotional synthetic speech.
引用
收藏
页码:27 / 32
页数:6
相关论文
共 50 条
  • [1] Speech synthesis of emotions using vowel features of a speaker
    Boku, K.
    Asada, T.
    Yoshitomi, Y.
    Tabuse, M.
    PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 18TH '13), 2013, : 176 - 179
  • [2] Speech Synthesis of Emotions Using Vowel Features
    Boku, Kanu
    Asada, Taro
    Yoshitomi, Yasunari
    Tabuse, Masayoshi
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2013, 1 (01) : 54 - 67
  • [3] Speech Synthesis of Emotions in a Sentence Using Vowel Features
    Makino, Rintaro
    Yoshitomi, Yasunari
    Asada, Taro
    Tabuse, Masayoshi
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB2020), 2020, : 403 - 406
  • [4] Speech Synthesis of Emotions in a Sentence using Vowel Features
    Makino, Rintaro
    Yoshitomi, Yasunari
    Asada, Taro
    Tabuse, Masayoshi
    JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2020, 7 (02): : 107 - 110
  • [5] VOWEL AND SPEAKER IDENTIFICATION IN NATURAL AND SYNTHETIC SPEECH
    LEHISTE, I
    MELTZER, D
    LANGUAGE AND SPEECH, 1973, 16 (OCT-D) : 356 - 364
  • [6] VOWEL AND SPEAKER IDENTIFICATION IN NATURAL AND SYNTHETIC SPEECH
    MELTZER, D
    LEHISTE, I
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 51 (01): : 131 - &
  • [7] Speaker identification using speech and lip features
    Ou, GB
    Li, X
    Yao, XC
    Jia, HB
    Murphey, YL
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2565 - 2570
  • [8] SPEAKER-INDEPENDENT VOWEL RECOGNITION IN PERSIAN SPEECH
    Nazari, Mohammad
    Sayadiyan, Abolghasem
    Valiollahzadeh, Seyyed Majid
    2008 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES: FROM THEORY TO APPLICATIONS, VOLS 1-5, 2008, : 672 - 676
  • [9] Classification of Emotions from Speech using Implicit Features
    Srivastava, Mohit
    Agarwal, Anupam
    2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 266 - 271
  • [10] SPEAKER NORMALIZATION OF STATIC AND DYNAMIC VOWEL SPECTRAL FEATURES
    ZAHORIAN, SA
    JAGHARGHI, AJ
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 90 (01): : 67 - 75