Singing voice synthesis based on deep neural networks

被引:55
|
作者
Nishimura, Masanari [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
基金
日本科学技术振兴机构;
关键词
Singing voice synthesis; Neural network; DNN; Acoustic model; HMM;
D O I
10.21437/Interspeech.2016-1027
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neural networks (DNNs) have largely improved on conventional approaches in various research areas including speech recognition, image recognition, speech synthesis, etc. The DNN-based text-to-speech (TTS) synthesis can synthesize high quality speech. In the DNN-based TTS system, a DNN is trained to represent the mapping function from contextual features to acoustic features, which are modeled by decision tree-clustered context dependent HMMs in the HMM-based TTS system. In this paper, we propose singing voice synthesis based on a DNN and evaluate its effectiveness. The relationship between the musical score and its acoustic features is modeled in frames by a DNN. For the sparseness of pitch context in a database, a musical-note-level pitch normalization and linear-interpolation techniques are used to prepare the excitation features. Subjective experimental results show that the DNN-based system outperformed the HMM-based system in terms of naturalness.
引用
收藏
页码:2478 / 2482
页数:5
相关论文
共 50 条
  • [31] Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation
    Pyykkonen, Pyry
    Mimilakis, Styliannos, I
    Drossos, Konstantinos
    Virtanen, Tuomas
    2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [32] Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning
    Miyamoto, Sou
    Nose, Takashi
    Ito, Suzunosuke
    Koike, Harunori
    Chiba, Yuya
    Ito, Akinori
    Shinozaki, Takahiro
    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PT II, 2018, 82 : 97 - 103
  • [33] Voice Conversion Based on Deep Neural Networks for Time-Variant Linear Transformations
    Kotani, Gaku
    Saito, Daisuke
    Minematsu, Nobuaki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2981 - 2992
  • [34] Expression Control in Singing Voice Synthesis
    Umbert, Marti
    Bonada, Jordi
    Goto, Masataka
    Nakano, Tomoyasu
    Sundberg, Johan
    IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 55 - 73
  • [35] Voice conversion based on deep neural networks for time-variant linear transformations
    Kotani, Gaku
    Saito, Daisuke
    Minematsu, Nobuaki
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1218 - 1221
  • [36] PARAMETRIC EMOTIONAL SINGING VOICE SYNTHESIS
    Park, Younsung
    Yun, Sungrack
    Yoo, Chang D.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4814 - 4817
  • [37] Pitch Preservation in Singing Voice Synthesis
    Liu, Shujun
    Zhu, Hai
    Wang, Kun
    Wang, Haujun
    arXiv, 2021,
  • [38] A singing voice database in Basque for statistical singing synthesis of bertsolaritza
    Sarasola, Xabier
    Navas, Eva
    Tavarez, David
    Erro, Daniel
    Saratxaga, Ibon
    Hernaez, Inma
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 756 - 759
  • [39] SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System
    Zhao, Junchuan
    Chetwin, Low Qi Hong
    Wang, Ye
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2641 - 2653
  • [40] SINGING INFORMATION PROCESSING BASED ON SINGING VOICE MODELING
    Goto, Masataka
    Saitou, Takeshi
    Nakano, Tomoyasu
    Fujihara, Hiromasa
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5506 - 5509