FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS

被引:0
|
作者
Nakamura, Kazuhiro [1 ]
Takaki, Shinji [1 ,2 ]
Hashimoto, Kei [1 ,2 ]
Oura, Keiichiro [1 ,2 ]
Nankaku, Yoshihiko [2 ]
Tokuda, Keiichi [1 ,2 ]
机构
[1] Technospeech Inc, Dept Res & Dev, Nagoya, Aichi, Japan
[2] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi, Japan
关键词
Singing voice synthesis; statistical model; acoustic modeling; convolutional neural network; computational complexity reduction;
D O I
10.1109/icassp40776.2020.9053811
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs. An acoustic feature sequence is generated for each segment that consists of long-term frames, and a natural trajectory is obtained without the parameter generation algorithm. Furthermore, a computational complexity reduction technique, which drives the DNNs in different time units depending on type of musical score features, is proposed. Experimental results show that the proposed method can synthesize natural sounding singing voices much faster than the conventional method.
引用
收藏
页码:7239 / 7243
页数:5
相关论文
共 50 条
  • [31] IMPROVING GAN-BASED VOCODER FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS
    He, Mengnan
    Guo, Tingwei
    Lu, Zhengxin
    Zhang, Ruixiong
    Gong, Caixia
    [J]. INTERSPEECH 2022, 2022, : 1601 - 1605
  • [32] On the Automation of High Level Synthesis of Convolutional Neural Networks
    Del Sozzo, Emanuele
    Solazzo, Andrea
    Miele, Antonio
    Santambrogio, Marco D.
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 217 - 224
  • [33] Fast Algorithms for Convolutional Neural Networks
    Lavin, Andrew
    Gray, Scott
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4013 - 4021
  • [34] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Li, Xian
    Wang, Zengfu
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2016, 3 (02) : 192 - 202
  • [35] A corpus-based concatenative Mandarin singing voice synthesis system
    Zhou, Shu-Sen
    Chen, Qing-Cai
    Wang, Dan-Dan
    Yang, Xiao-Hong
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2695 - 2699
  • [36] Singing Voice Synthesis System for Carnatic Music
    Rajan, Ragesh M.
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 831 - 835
  • [37] A HMM-based Mandarin Chinese Singing Voice Synthesis System
    Xian Li
    Zengfu Wang
    [J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3 (02) : 192 - 202
  • [38] Comparative study of singing voice detection based on deep neural networks and ensemble learning
    You, Shingchern D.
    Liu, Chien-Hung
    Chen, Woei-Kae
    [J]. HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2018, 8
  • [39] LITESING: TOWARDS FAST, LIGHTWEIGHT AND EXPRESSIVE SINGING VOICE SYNTHESIS
    Zhuang, Xiaobin
    Jiang, Tao
    Chou, Szu-Yu
    Wu, Bin
    Hu, Peng
    Lui, Simon
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7078 - 7082
  • [40] SINGING STYLE INVESTIGATION BY RESIDUAL SIAMESE CONVOLUTIONAL NEURAL NETWORKS
    Wang, Cheng-i
    Tzanetakis, George
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 116 - 120