XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

被引:25
|
作者
Lu, Peiling [1 ]
Wu, Jie [1 ]
Luan, Jian [1 ]
Tan, Xu [2 ]
Zhou, Li [1 ]
机构
[1] Xiaoice, Microsoft Software Technol Ctr Asia, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
来源
关键词
singing voice synthesis; integrated modeling; XiaoiceSing; singing F0 modeling; singing duration modeling;
D O I
10.21437/Interspeech.2020-1410
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents XiaoiceSing, a high-quality singing voice synthesis system which employs an integrated network for spectrum, F0 and duration modeling. We follow the main architecture of FastSpeech while proposing some singing-specific design: 1) Besides phoneme ID and position encoding, features from musical score (e.g.note pitch and length) are also added. 2) To attenuate off-key issues, we add a residual connection in F0 prediction. 3) In addition to the duration loss of each phoneme, the duration of all the phonemes in a musical note is accumulated to calculate the syllable duration loss for rhythm enhancement. Experiment results show that XiaoiceSing outperforms the baseline system of convolutional neural networks by 1.44 MOS on sound quality, 1.18 on pronunciation accuracy and 1.38 on naturalness respectively. In two A/B tests, the proposed F0 and duration modeling methods achieve 97.3% and 84.3% preference rate over baseline respectively, which demonstrates the overwhelming advantages of XiaoiceSing.
引用
收藏
页码:1306 / 1310
页数:5
相关论文
共 50 条
  • [1] FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS
    Nakamura, Kazuhiro
    Takaki, Shinji
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7239 - 7243
  • [2] Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
    Wang, Yu
    Wang, Xinsheng
    Zhu, Pengcheng
    Wu, Jie
    Li, Hanzhao
    Xue, Heyang
    Zhang, Yongmao
    Xie, Lei
    Bi, Mengxiao
    [J]. INTERSPEECH 2022, 2022, : 4242 - 4246
  • [3] SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System
    Zhao, Junchuan
    Chetwin, Low Qi Hong
    Wang, Ye
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2641 - 2653
  • [4] Singing Voice Synthesis System for Carnatic Music
    Rajan, Ragesh M.
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 831 - 835
  • [5] On the Assessment of High-Quality Voice Recordings including Voice Postprocessing
    Beerends, John G.
    Beerends, Imre
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2015, 63 (03): : 174 - 183
  • [6] A Lyrics to Singing Voice Synthesis system with variable timbre
    Li, Jinlong
    Yang, Hongwu
    Zhang, Weizhao
    Cai, Lianhong
    [J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL II, 2010, : 109 - 112
  • [7] On High-Quality Synthesis
    Kupferman, Orna
    [J]. COMPUTER SCIENCE - THEORY AND APPLICATIONS, CSR 2016, 2016, 9691 : 1 - 15
  • [8] A Lyrics to Singing Voice Synthesis System with Variable Timbre
    Li, Jinlong
    Yang, Hongwu
    Zhang, Weizhao
    Cai, Lianhong
    [J]. APPLIED INFORMATICS AND COMMUNICATION, PT 2, 2011, 225 : 186 - +
  • [9] An on-the-fly Mandarin singing voice synthesis system
    Lin, CY
    Jang, JSR
    Hwang, SH
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 631 - 638
  • [10] A singing voice synthesis system based on sinusoidal modeling
    Macon, MW
    JensenLink, L
    Oliverio, J
    Clements, MA
    George, EB
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 435 - 438