Singing voice synthesis based on deep neural networks

被引:55
|
作者
Nishimura, Masanari [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
基金
日本科学技术振兴机构;
关键词
Singing voice synthesis; Neural network; DNN; Acoustic model; HMM;
D O I
10.21437/Interspeech.2016-1027
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Singing voice synthesis techniques have been proposed based on a hidden Markov model (HMM). In these approaches, the spectrum, excitation, and duration of singing voices are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. However, the quality of the synthesized singing voices still has not reached that of natural singing voices. Deep neural networks (DNNs) have largely improved on conventional approaches in various research areas including speech recognition, image recognition, speech synthesis, etc. The DNN-based text-to-speech (TTS) synthesis can synthesize high quality speech. In the DNN-based TTS system, a DNN is trained to represent the mapping function from contextual features to acoustic features, which are modeled by decision tree-clustered context dependent HMMs in the HMM-based TTS system. In this paper, we propose singing voice synthesis based on a DNN and evaluate its effectiveness. The relationship between the musical score and its acoustic features is modeled in frames by a DNN. For the sparseness of pitch context in a database, a musical-note-level pitch normalization and linear-interpolation techniques are used to prepare the excitation features. Subjective experimental results show that the DNN-based system outperformed the HMM-based system in terms of naturalness.
引用
收藏
页码:2478 / 2482
页数:5
相关论文
共 50 条
  • [41] FC-U2-Net: A Novel Deep Neural Network for Singing Voice Separation
    Ni, Xin
    Ren, Jia
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 489 - 494
  • [42] A Comparison of Boosted Deep Neural Networks for Voice Activity Detection
    Krishnakumar, Harshit
    Williamson, Donald S.
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [43] Classification of Children with Voice Impairments using Deep Neural Networks
    Huang, Chien-Lin
    Hori, Chiori
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [44] Discriminative Training of Complex-valued Deep Recurrent Neural Network for Singing Voice Separation
    Lee, Yuan-Shan
    Yu, Kuo
    Chen, Sih-Huei
    Wang, Jia-Ching
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1327 - 1335
  • [45] Mandarin Singing-voice Synthesis Using an HNM Based Scheme
    Gu, Hung-Yan
    Liao, Huang-Liang
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2011, 27 (01) : 303 - 317
  • [46] The state-of-the-art of singing voice synthesis based on statistical model
    Oura, Keiichiro
    Journal of the Institute of Electronics, Information and Communication Engineers, 2015, 98 (06): : 460 - 466
  • [47] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Li, Xian
    Wang, Zengfu
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2016, 3 (02) : 192 - 202
  • [48] RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
    He, Jinzheng
    Liu, Jinglin
    Ye, Zhenhui
    Huang, Rongjie
    Cui, Chenye
    Liu, Huadai
    Zhao, Zhou
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 236 - 248
  • [49] A corpus-based concatenative Mandarin singing voice synthesis system
    Zhou, Shu-Sen
    Chen, Qing-Cai
    Wang, Dan-Dan
    Yang, Xiao-Hong
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2695 - 2699
  • [50] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Xian Li
    Zengfu Wang
    IEEE/CAAJournalofAutomaticaSinica, 2016, 3 (02) : 192 - 202