SinTechSVS: A Singing Technique Controllable Singing Voice Synthesis System

被引：0

作者：

Zhao, Junchuan ^{[1
]}

Chetwin, Low Qi Hong ^{[1
]}

Wang, Ye ^{[1
]}

机构：

[1] Natl Univ Singapore, Sch Comp, Singapore 119260, Singapore

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Hidden Markov models; Annotations; Timbre; Task analysis; Deep learning; Synthesizers; Controllability; Singing voice synthesis; singing voice synthesis conditioned on singing techniques; singing technique classification; singing technique recommendation; metric; deep learning; EXPRESSION;

D O I：

10.1109/TASLP.2024.3394769

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The precise control of singing techniques is of utmost importance in achieving emotionally expressive vocal performances. To bridge the gap between current Singing Voice Synthesis (SVS) systems and human singers, our paper focuses on developing an SVS system that allows for control over singing techniques. In this paper, we introduce SinTechSVS, a singing technique controllable SVS system composed of a singing technique annotator, a singing technique controllable synthesizer, and a singing technique recommender. Our approach leverages transfer learning for efficient singing technique annotation and adapts the DiffSinger framework with additional style encoders and an attention-based singing technique local score (STLS) module to enhance singing technique controllability. We also propose a Seq2Seq singing technique recommender for the new task of Singing Technique Recommendation (STR). Experimental results demonstrate that SinTechSVS significantly improves the quality and expressiveness of synthesized vocal performances, with comparable general synthesis capabilities to state-of-the-art SVS systems and enhanced control over singing techniques, as evidenced by objective and subjective evaluations. To the best of our knowledge, SinTechSVS is the first SVS capable of controlling singing techniques.

引用

页码：2641 / 2653

页数：13

共 50 条

[1] Singing Voice Synthesis System for Carnatic Music
Rajan, Ragesh M.
[J]. 2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 831 - 835
[2] A Singing Style Modeling System for Singing Voice Synthesizers
Saino, Keijiro
Tachibana, Makoto
Kenmochi, Hideki
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2894 - 2897
[3] A singing voice database in Basque for statistical singing synthesis of bertsolaritza
Sarasola, Xabier
Navas, Eva
Tavarez, David
Erro, Daniel
Saratxaga, Ibon
Hernaez, Inma
[J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 756 - 759
[4] A Lyrics to Singing Voice Synthesis system with variable timbre
Li, Jinlong
Yang, Hongwu
Zhang, Weizhao
Cai, Lianhong
[J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL II, 2010, : 109 - 112
[5] A Lyrics to Singing Voice Synthesis System with Variable Timbre
Li, Jinlong
Yang, Hongwu
Zhang, Weizhao
Cai, Lianhong
[J]. APPLIED INFORMATICS AND COMMUNICATION, PT 2, 2011, 225 : 186 - +
[6] An on-the-fly Mandarin singing voice synthesis system
Lin, CY
Jang, JSR
Hwang, SH
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 631 - 638
[7] An HMM-based Singing Voice Synthesis System
Saino, Keijiro
Zen, Heiga
Nankaku, Yoshihiko
Lee, Akinobu
Tokuda, Keiichi
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2274 - 2277
[8] A singing voice synthesis system based on sinusoidal modeling
Macon, MW
JensenLink, L
Oliverio, J
Clements, MA
George, EB
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 435 - 438
[9] Singing voice outcomes following singing voice therapy
Dastolfo-Hromack, Christina
Thomas, Tracey L.
Rosen, Clark A.
Gartner-Schmidt, Jackie
[J]. LARYNGOSCOPE, 2016, 126 (11): : 2546 - 2551
[10] The Singing Voice of Woman. Anatomy and Physiology - Technique and Strategies of classical Singing
Freytag, Martina
[J]. MUSIK UND KIRCHE, 2017, 87 (05): : 326 - 327

← 1 2 3 4 5 →