Multi-Voice Singing Synthesis From Lyrics

被引:2
|
作者
Resna, S. [1 ]
Rajan, Rajeev [2 ]
机构
[1] Tata Elxsi, MultiMedia & Commun Vert, Technopk, Thiruvananthapuram, Kerala, India
[2] APJ Abdul Kalam Technol Univ, Dept Elect & Commun Engn, Coll Engn, Thiruvananthapuram, Kerala, India
关键词
Multi-speaker; Text-to-singing conversion; Singing voice synthesis; Phonetic quality;
D O I
10.1007/s00034-022-02122-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, a multi-voice singing synthesis framework is proposed to convert lyrics to their sung version in the target speaker's voice. It consists of three blocks: a text-to-speech (TTS) module, a speech-to-singing (STS) module, and an intelligibility enhancement module. Synthesized speech is generated from lyrics for a target speaker's voice by a TTS converter in the front end. Later, a sung version is synthesized in target melody through an encoder-decoder model in the STS module. Further, phonetic intelligibility is enhanced using an intelligibility enhancement module based on an audio style transfer scheme. The proposed system is systematically evaluated using LibriSpeech and NUS-48E corpus using subjective and objective evaluation. We have compared our model with a state-of-the-art multi-voice singing synthesis model based on a generative adversarial network (GAN). Our study shows that the proposed model performs on par with the baseline model without any phoneme annotations.
引用
收藏
页码:307 / 321
页数:15
相关论文
共 50 条
  • [21] FUTURE-TRENDS IN MULTI-VOICE FILMS FOR INTERNATIONAL TELEVISION
    SHELLY, L
    JOURNAL OF THE SMPTE-SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS, 1957, 66 (09): : 572 - 572
  • [22] Canons as orations: the case of Josquin's multi-voice chansons
    Koutsobina, Vassiliki
    EARLY MUSIC, 2017, 45 (02) : 231 - +
  • [23] Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation
    Schulze-Forster, Kilian
    Doire, Clement S. J.
    Richard, Gael
    Badeau, Roland
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 2382 - 2395
  • [24] Realization of Multiplexing and Switching on Compressed Multi-voice via Audio Line
    School of Electronic Information, Wuhan University, Wuhan 430072, China
    Proc Int Symp Test Meas, 1600, (311-314):
  • [25] Where the images come from Multi-voice dialogue on the role (and strength or weakness) of cinema, today
    Manassero, Roberto
    CINEFORUM, 2022, 61 (05): : 68 - 69
  • [26] Detection of Singing Mistakes from Singing Voice
    Miyagawa, Isao
    Chiba, Yuya
    Nose, Takashi
    Ito, Akinori
    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PT II, 2018, 82 : 130 - 136
  • [27] Realization of multiplexing and switching on compressed multi-voice via audio line
    Fu, W
    Liu, YS
    Huang, HB
    ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 311 - 314
  • [28] MPop600: A Mandarin Popular Song Database with Aligned Audio, Lyrics, and Musical Scores for Singing Voice Synthesis
    Chu, Chan-Chuan
    Yang, Fu-Rong
    Lee, Yi-Jhe
    Liu, Yi-Wen
    Wu, Shan-Hung
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1647 - 1652
  • [29] DeepSinger: Singing Voice Synthesis with Data Mined From the Web
    Ren, Yi
    Tan, Xu
    Qin, Tao
    Luan, Jian
    Zhao, Zhou
    Liu, Tie-Yan
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1979 - 1989
  • [30] Expression Control in Singing Voice Synthesis
    Umbert, Marti
    Bonada, Jordi
    Goto, Masataka
    Nakano, Tomoyasu
    Sundberg, Johan
    IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 55 - 73