FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS

被引:0
|
作者
Nakamura, Kazuhiro [1 ]
Takaki, Shinji [1 ,2 ]
Hashimoto, Kei [1 ,2 ]
Oura, Keiichiro [1 ,2 ]
Nankaku, Yoshihiko [2 ]
Tokuda, Keiichi [1 ,2 ]
机构
[1] Technospeech Inc, Dept Res & Dev, Nagoya, Aichi, Japan
[2] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi, Japan
关键词
Singing voice synthesis; statistical model; acoustic modeling; convolutional neural network; computational complexity reduction;
D O I
10.1109/icassp40776.2020.9053811
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs. An acoustic feature sequence is generated for each segment that consists of long-term frames, and a natural trajectory is obtained without the parameter generation algorithm. Furthermore, a computational complexity reduction technique, which drives the DNNs in different time units depending on type of musical score features, is proposed. Experimental results show that the proposed method can synthesize natural sounding singing voices much faster than the conventional method.
引用
收藏
页码:7239 / 7243
页数:5
相关论文
共 50 条
  • [1] XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
    Lu, Peiling
    Wu, Jie
    Luan, Jian
    Tan, Xu
    Zhou, Li
    [J]. INTERSPEECH 2020, 2020, : 1306 - 1310
  • [2] Singing Voice Detection Based on Convolutional Neural Networks
    Huang, Hong-Ming
    Chen, Woei-Kae
    Liu, Chien-Hung
    You, Shingchern D.
    [J]. 2018 7TH IEEE INTERNATIONAL SYMPOSIUM ON NEXT-GENERATION ELECTRONICS (ISNE), 2018, : 223 - 226
  • [3] Singing voice synthesis based on deep neural networks
    Nishimura, Masanari
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2478 - 2482
  • [4] High-Quality Plane Wave Compounding Using Convolutional Neural Networks
    Gasse, Maxime
    Millioz, Fabien
    Roux, Emmanuel
    Garcia, Damien
    Liebgott, Herve
    Friboulet, Denis
    [J]. IEEE TRANSACTIONS ON ULTRASONICS FERROELECTRICS AND FREQUENCY CONTROL, 2017, 64 (10) : 1637 - 1639
  • [5] Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks
    Gui, Wenming
    Li, Yukun
    Zang, Xian
    Zhang, Jinglan
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [6] Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
    Wang, Yu
    Wang, Xinsheng
    Zhu, Pengcheng
    Wu, Jie
    Li, Hanzhao
    Xue, Heyang
    Zhang, Yongmao
    Xie, Lei
    Bi, Mengxiao
    [J]. INTERSPEECH 2022, 2022, : 4242 - 4246
  • [7] Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System
    Hono, Yukiya
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2803 - 2815
  • [8] Korean Singing Voice Synthesis System based on an LSTM Recurrent Neural Network
    Kim, Juntae
    Choi, Heejin
    Park, Jinuk
    Hahn, Minsoo
    Kim, Sangjin
    Kim, Jong-Jin
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1551 - 1555
  • [9] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    [J]. TheJournalofChinaUniversitiesofPostsandTelecommunications., 2014, 21 (05) - 75+93
  • [10] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    [J]. The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 68 - 75