WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

被引:52
|
作者
Chandna, Pritish [1 ]
Blaauw, Merlijn [1 ]
Bonada, Jordi [1 ]
Gomez, Emilia [1 ,2 ]
机构
[1] Univ Pompeu Fabra, Mus Technol Grp, Barcelona, Spain
[2] European Commiss, Joint Res Ctr, Seville, Spain
基金
欧盟地平线“2020”;
关键词
Wasserstein-GAN; DCGAN; WORLD vocoder; Singing Voice Synthesis; Block-wise Predictions;
D O I
10.23919/eusipco.2019.8903099
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a deep neural network based singing voice synthesizer, inspired by the Deep Convolutions Generative Adversarial Networks (DCGAN) architecture and optimized using the Wasserstein-GAN algorithm. We use vocoder parameters for acoustic modelling, to separate the influence of pitch and timbre. This facilitates the modelling of the large variability of pitch in the singing voice. Our network takes a block of consecutive frame-wise linguistic and fundamental frequency features, along with global singer identity as input and outputs vocoder features, corresponding to the block of features. This block-wise approach, along with the training methodology allows us to model temporal dependencies within the features of the input block. For inference, sequential blocks are concatenated using an overlap-add procedure. We show that the performance of our model is competitive with regards to the state-of-the-art and the original sample using objective metrics and a subjective listening test. We also present examples of the synthesis on a supplementary website and the source code via GitHub.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Multi-Voice Singing Synthesis From Lyrics
    Resna, S.
    Rajan, Rajeev
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 307 - 321
  • [2] Multi-Voice Singing Synthesis From Lyrics
    S. Resna
    Rajeev Rajan
    Circuits, Systems, and Signal Processing, 2023, 42 : 307 - 321
  • [3] Word Intelligibility in Multi-voice Singing: The Influence of Chorus Size
    Condit-Schultz, Nathaniel
    Huron, David
    JOURNAL OF VOICE, 2017, 31 (01) : 121.e1 - 121.e8
  • [4] Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN
    Cho, Yin-Ping
    Tsao, Yu
    Wang, Hsin-Min
    Liu, Yi-Wen
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1956 - 1963
  • [5] Hymnos - A network for the study of multi-voice singing between orality and writing
    Ginesi, Gianni
    TRANS-REVISTA TRANSCULTURAL DE MUSICA, 2012, 16
  • [6] A singing voice synthesizer controlled by the arm motions
    Ito, Masashi
    2013 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2013), 2013, : 468 - 471
  • [7] A Distinct Synthesizer Convolutional TasNet for Singing Voice Separation
    Tian, Congzhou
    Yang, Deshun
    Chen, Xiaoou
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 37 - 48
  • [8] Basaglia's gesture. Multi-voice conversation
    不详
    AUT AUT, 2024, (404): : 115 - 121
  • [9] Design of Multi-voice Electronic Piano by Chip Microcomputer
    Li, Wei
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 974 - 977
  • [10] What is the meaning of family participation in schools? A multi-voice perspective
    Martinez-Figueira, Maria-Esther
    Fernandez-Menor, Isabel
    Crestar Farina, Irene
    Mulloni Martinez, Samantha
    EDUCATIONAL RESEARCH, 2024, 66 (04) : 381 - 395