WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

被引：52

作者：

Chandna, Pritish ^{[1
]}

Blaauw, Merlijn ^{[1
]}

Bonada, Jordi ^{[1
]}

Gomez, Emilia ^{[1
,2
]}

机构：

[1] Univ Pompeu Fabra, Mus Technol Grp, Barcelona, Spain

[2] European Commiss, Joint Res Ctr, Seville, Spain

来源：

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年

基金：

欧盟地平线“2020”;

关键词：

Wasserstein-GAN; DCGAN; WORLD vocoder; Singing Voice Synthesis; Block-wise Predictions;

D O I：

10.23919/eusipco.2019.8903099

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We present a deep neural network based singing voice synthesizer, inspired by the Deep Convolutions Generative Adversarial Networks (DCGAN) architecture and optimized using the Wasserstein-GAN algorithm. We use vocoder parameters for acoustic modelling, to separate the influence of pitch and timbre. This facilitates the modelling of the large variability of pitch in the singing voice. Our network takes a block of consecutive frame-wise linguistic and fundamental frequency features, along with global singer identity as input and outputs vocoder features, corresponding to the block of features. This block-wise approach, along with the training methodology allows us to model temporal dependencies within the features of the input block. For inference, sequential blocks are concatenated using an overlap-add procedure. We show that the performance of our model is competitive with regards to the state-of-the-art and the original sample using objective metrics and a subjective listening test. We also present examples of the synthesis on a supplementary website and the source code via GitHub.

引用

页数：5

共 50 条

[1] Multi-Voice Singing Synthesis From Lyrics
Resna, S.
Rajan, Rajeev
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 307 - 321
[2] Multi-Voice Singing Synthesis From Lyrics
S. Resna
Rajeev Rajan
Circuits, Systems, and Signal Processing, 2023, 42 : 307 - 321
[3] Word Intelligibility in Multi-voice Singing: The Influence of Chorus Size
Condit-Schultz, Nathaniel
Huron, David
JOURNAL OF VOICE, 2017, 31 (01) : 121.e1 - 121.e8
[4] Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN
Cho, Yin-Ping
Tsao, Yu
Wang, Hsin-Min
Liu, Yi-Wen
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1956 - 1963
[5] Hymnos - A network for the study of multi-voice singing between orality and writing
Ginesi, Gianni
TRANS-REVISTA TRANSCULTURAL DE MUSICA, 2012, 16
[6] A singing voice synthesizer controlled by the arm motions
Ito, Masashi
2013 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2013), 2013, : 468 - 471
[7] A Distinct Synthesizer Convolutional TasNet for Singing Voice Separation
Tian, Congzhou
Yang, Deshun
Chen, Xiaoou
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 37 - 48
[8] Basaglia's gesture. Multi-voice conversation
不详
AUT AUT, 2024, (404): : 115 - 121
[9] Design of Multi-voice Electronic Piano by Chip Microcomputer
Li, Wei
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 974 - 977
[10] What is the meaning of family participation in schools? A multi-voice perspective
Martinez-Figueira, Maria-Esther
Fernandez-Menor, Isabel
Crestar Farina, Irene
Mulloni Martinez, Samantha
EDUCATIONAL RESEARCH, 2024, 66 (04) : 381 - 395

← 1 2 3 4 5 →