SINGING VOICE CONVERSION WITH NON-PARALLEL DATA

被引:12
|
作者
Chen, Xin [1 ]
Chu, Wei [1 ]
Guo, Jinxi [2 ]
Xu, Ning [1 ]
机构
[1] Snap Inc, Snap Res, Santa Monica, CA 90405 USA
[2] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
关键词
Singing voice conversion; phonetic posteriors; non-parallel data; singer-independent content; deep neural networks (DNN);
D O I
10.1109/MIPR.2019.00059
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (RNN) with a Deep Bidirectional Long Short Term Memory (DBLSTM) structure is used to model the mapping from person-independent content to the acoustic features of the target person. F0 and aperiodic are obtained through the original singing voice, and used with acoustic features to reconstruct the target singing voice through a vocoder. In the obtained singing voice, the targeted and sourced singers sound similar. To our knowledge, this is the first study that uses non parallel data to train a singing voice conversion system. Subjective evaluations demonstrate that the proposed method effectively converts singing voices.
引用
收藏
页码:292 / 296
页数:5
相关论文
共 50 条
  • [1] VAW-GAN for Singing Voice Conversion with Non-parallel Training Data
    Lu, Junchen
    Zhou, Kun
    Sisman, Berrak
    Li, Haizhou
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 514 - 519
  • [2] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132
  • [3] Non-Parallel Voice Conversion for ASR Augmentation
    Wang, Gary
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Huang, Yinghui
    Emond, Jesse
    Mengibar, Pedro Moreno
    [J]. INTERSPEECH 2022, 2022, : 3408 - 3412
  • [4] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 201 - 205
  • [5] NOVEL METRIC LEARNING FOR NON-PARALLEL VOICE CONVERSION
    Shah, Nirmesh J.
    Patil, Hemant A.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3722 - 3726
  • [6] CVC: Contrastive Learning for Non-parallel Voice Conversion
    Li, Tingle
    Liu, Yichen
    Hu, Chenxu
    Zhao, Hang
    [J]. INTERSPEECH 2021, 2021, : 1324 - 1328
  • [7] Frame Labeling and Mapping for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Ehnes, Jochen Walter
    Lu, Yanfeng
    Ming, Huaiping
    Huang, Dongyan
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 361 - 365
  • [8] Non-parallel Voice Conversion with Generative Attentional Networks
    Chiu, Tse Wei
    Guo, You Sheng
    Chang, Pao-Chi
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 141 - 145
  • [9] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. INTERSPEECH 2019, 2019, : 674 - 678
  • [10] Transferring Source Style in Non-Parallel Voice Conversion
    Liu, Songxiang
    Cao, Yuewen
    Kang, Shiyin
    Hu, Na
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 4721 - 4725