SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System

被引:0
|
作者
Zhao, Junchuan [1 ]
Chetwin, Low Qi Hong [1 ]
Wang, Ye [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 119260, Singapore
关键词
Hidden Markov models; Annotations; Timbre; Task analysis; Deep learning; Synthesizers; Controllability; Singing voice synthesis; singing voice synthesis conditioned on singing techniques; singing technique classification; singing technique recommendation; metric; deep learning; EXPRESSION;
D O I
10.1109/TASLP.2024.3394769
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The precise control of singing techniques is of utmost importance in achieving emotionally expressive vocal performances. To bridge the gap between current Singing Voice Synthesis (SVS) systems and human singers, our paper focuses on developing an SVS system that allows for control over singing techniques. In this paper, we introduce SinTechSVS, a singing technique controllable SVS system composed of a singing technique annotator, a singing technique controllable synthesizer, and a singing technique recommender. Our approach leverages transfer learning for efficient singing technique annotation and adapts the DiffSinger framework with additional style encoders and an attention-based singing technique local score (STLS) module to enhance singing technique controllability. We also propose a Seq2Seq singing technique recommender for the new task of Singing Technique Recommendation (STR). Experimental results demonstrate that SinTechSVS significantly improves the quality and expressiveness of synthesized vocal performances, with comparable general synthesis capabilities to state-of-the-art SVS systems and enhanced control over singing techniques, as evidenced by objective and subjective evaluations. To the best of our knowledge, SinTechSVS is the first SVS capable of controlling singing techniques.
引用
收藏
页码:2641 / 2653
页数:13
相关论文
共 50 条
  • [1] Singing Voice Synthesis System for Carnatic Music
    Rajan, Ragesh M.
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 831 - 835
  • [2] A Singing Style Modeling System for Singing Voice Synthesizers
    Saino, Keijiro
    Tachibana, Makoto
    Kenmochi, Hideki
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2894 - 2897
  • [3] A singing voice database in Basque for statistical singing synthesis of bertsolaritza
    Sarasola, Xabier
    Navas, Eva
    Tavarez, David
    Erro, Daniel
    Saratxaga, Ibon
    Hernaez, Inma
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 756 - 759
  • [4] A Lyrics to Singing Voice Synthesis system with variable timbre
    Li, Jinlong
    Yang, Hongwu
    Zhang, Weizhao
    Cai, Lianhong
    [J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL II, 2010, : 109 - 112
  • [5] A Lyrics to Singing Voice Synthesis System with Variable Timbre
    Li, Jinlong
    Yang, Hongwu
    Zhang, Weizhao
    Cai, Lianhong
    [J]. APPLIED INFORMATICS AND COMMUNICATION, PT 2, 2011, 225 : 186 - +
  • [6] An on-the-fly Mandarin singing voice synthesis system
    Lin, CY
    Jang, JSR
    Hwang, SH
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 631 - 638
  • [7] An HMM-based Singing Voice Synthesis System
    Saino, Keijiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2274 - 2277
  • [8] A singing voice synthesis system based on sinusoidal modeling
    Macon, MW
    JensenLink, L
    Oliverio, J
    Clements, MA
    George, EB
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 435 - 438
  • [9] Singing voice outcomes following singing voice therapy
    Dastolfo-Hromack, Christina
    Thomas, Tracey L.
    Rosen, Clark A.
    Gartner-Schmidt, Jackie
    [J]. LARYNGOSCOPE, 2016, 126 (11): : 2546 - 2551
  • [10] The Singing Voice of Woman. Anatomy and Physiology - Technique and Strategies of classical Singing
    Freytag, Martina
    [J]. MUSIK UND KIRCHE, 2017, 87 (05): : 326 - 327