F0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation

被引:1
|
作者
Janyoi, Pongsathon [1 ]
Seresangtakul, Pusadee [2 ]
机构
[1] Khon Kaen Univ, Dept Comp Sci, Nat Language & Speech Proc Lab, Khon Kaen, Thailand
[2] Khon Kaen Univ, Dept Comp Sci, Fac Sci, Khon Kaen, Thailand
关键词
Fundamental frequency; speech synthesis; deep neural networks; HMM; GENERATION;
D O I
10.34028/iajit/17/6/9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The generation of the fundamental frequency (F-0) plays an important role in speech synthesis, which directly influences the naturalness of synthetic speech. In conventional parametric speech synthesis, F-0 is predicted frame-by-frame. This method is insufficient to represent F-0 contours in larger units, especially tone contours of syllables in tonal languages that deviate as a result of long-term context dependency. This work proposes a syllable-level F-0 model that represents F-0 contours within syllables, using syllable-level F-0 parameters that comprise the sampling F-0 points and dynamic features. A Deep Neural Network (DNN) was used to represent the relationships between syllable-level contextual features and syllable-level F-0 parameters. The proposed model was examined using an Isarn speech synthesis system with both large and small training sets. For all training sets, the results of objective and subjective tests indicate that the proposed approach outperforms the baseline systems based on hidden Markov models and DNNS that predict F-0 values at the frame level.
引用
收藏
页码:906 / 915
页数:10
相关论文
共 50 条
  • [1] Improving F0 Prediction Using Bidirectional Associative Memories and Syllable-Level F0 Features for HMM-based Mandarin Speech Synthesis
    Gao, Li
    Ling, Zhen-Hua
    Chen, Ling-Hui
    Dai, Li-Rong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 275 - 279
  • [2] Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F0 Representation
    Janyoi, Pongsathon
    Seresangtakul, Pusadee
    APPLIED SCIENCES-BASEL, 2020, 10 (18):
  • [3] Modeling F0 trajectories in hierarchically structured deep neural networks
    Yin, Xiang
    Lei, Ming
    Qian, Yao
    Soong, Frank K.
    He, Lei
    Ling, Zhen-Hua
    Dai, Li-Rong
    SPEECH COMMUNICATION, 2016, 76 : 82 - 92
  • [4] Whisper to Normal Speech Based on Deep Neural Networks with MCC and F0 Features
    Lian, Hailun
    Hu, Yuting
    Zhou, Jian
    Wang, Huabin
    Tao, Liang
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [5] F0 Modeling in HMM-Based Speech Synthesis System using Deep Belief Network
    Mukherjee, Sankar
    Mandal, Shyamal Kumar Das
    2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,
  • [6] Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features
    Ben Alex, Starlet
    Mary, Leena
    Babu, Ben P.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (11) : 5681 - 5709
  • [7] Additive modeling of English F0 contour for speech synthesis
    Sakai, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 277 - 280
  • [8] Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features
    Starlet Ben Alex
    Leena Mary
    Ben P. Babu
    Circuits, Systems, and Signal Processing, 2020, 39 : 5681 - 5709
  • [9] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Dai, Li-Rong
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +
  • [10] Emotional Voice Conversion Using Deep Neural Networks with MCC and F0 Features
    Luo, Zhaojie
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 977 - 981