Multi-Voice Singing Synthesis From Lyrics

被引：2

作者：

Resna, S. ^{[1
]}

Rajan, Rajeev ^{[2
]}

机构：

[1] Tata Elxsi, MultiMedia & Commun Vert, Technopk, Thiruvananthapuram, Kerala, India

[2] APJ Abdul Kalam Technol Univ, Dept Elect & Commun Engn, Coll Engn, Thiruvananthapuram, Kerala, India

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2023年 / 42卷 / 01期

关键词：

Multi-speaker; Text-to-singing conversion; Singing voice synthesis; Phonetic quality;

D O I：

10.1007/s00034-022-02122-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, a multi-voice singing synthesis framework is proposed to convert lyrics to their sung version in the target speaker's voice. It consists of three blocks: a text-to-speech (TTS) module, a speech-to-singing (STS) module, and an intelligibility enhancement module. Synthesized speech is generated from lyrics for a target speaker's voice by a TTS converter in the front end. Later, a sung version is synthesized in target melody through an encoder-decoder model in the STS module. Further, phonetic intelligibility is enhanced using an intelligibility enhancement module based on an audio style transfer scheme. The proposed system is systematically evaluated using LibriSpeech and NUS-48E corpus using subjective and objective evaluation. We have compared our model with a state-of-the-art multi-voice singing synthesis model based on a generative adversarial network (GAN). Our study shows that the proposed model performs on par with the baseline model without any phoneme annotations.

引用

页码：307 / 321

页数：15

共 50 条

[21] FUTURE-TRENDS IN MULTI-VOICE FILMS FOR INTERNATIONAL TELEVISION
SHELLY, L
JOURNAL OF THE SMPTE-SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS, 1957, 66 (09): : 572 - 572
[22] Canons as orations: the case of Josquin's multi-voice chansons
Koutsobina, Vassiliki
EARLY MUSIC, 2017, 45 (02) : 231 - +
[23] Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation
Schulze-Forster, Kilian
Doire, Clement S. J.
Richard, Gael
Badeau, Roland
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 2382 - 2395
[24] Realization of Multiplexing and Switching on Compressed Multi-voice via Audio Line
School of Electronic Information, Wuhan University, Wuhan 430072, China
Proc Int Symp Test Meas, 1600, (311-314):
[25] Where the images come from Multi-voice dialogue on the role (and strength or weakness) of cinema, today
Manassero, Roberto
CINEFORUM, 2022, 61 (05): : 68 - 69
[26] Detection of Singing Mistakes from Singing Voice
Miyagawa, Isao
Chiba, Yuya
Nose, Takashi
Ito, Akinori
ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PT II, 2018, 82 : 130 - 136
[27] Realization of multiplexing and switching on compressed multi-voice via audio line
Fu, W
Liu, YS
Huang, HB
ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 311 - 314
[28] MPop600: A Mandarin Popular Song Database with Aligned Audio, Lyrics, and Musical Scores for Singing Voice Synthesis
Chu, Chan-Chuan
Yang, Fu-Rong
Lee, Yi-Jhe
Liu, Yi-Wen
Wu, Shan-Hung
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1647 - 1652
[29] DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Ren, Yi
Tan, Xu
Qin, Tao
Luan, Jian
Zhao, Zhou
Liu, Tie-Yan
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1979 - 1989
[30] Expression Control in Singing Voice Synthesis
Umbert, Marti
Bonada, Jordi
Goto, Masataka
Nakano, Tomoyasu
Sundberg, Johan
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 55 - 73

← 1 2 3 4 5 →