SINGING VOICE CONVERSION WITH NON-PARALLEL DATA

被引:12
|
作者
Chen, Xin [1 ]
Chu, Wei [1 ]
Guo, Jinxi [2 ]
Xu, Ning [1 ]
机构
[1] Snap Inc, Snap Res, Santa Monica, CA 90405 USA
[2] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
关键词
Singing voice conversion; phonetic posteriors; non-parallel data; singer-independent content; deep neural networks (DNN);
D O I
10.1109/MIPR.2019.00059
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (RNN) with a Deep Bidirectional Long Short Term Memory (DBLSTM) structure is used to model the mapping from person-independent content to the acoustic features of the target person. F0 and aperiodic are obtained through the original singing voice, and used with acoustic features to reconstruct the target singing voice through a vocoder. In the obtained singing voice, the targeted and sourced singers sound similar. To our knowledge, this is the first study that uses non parallel data to train a singing voice conversion system. Subjective evaluations demonstrate that the proposed method effectively converts singing voices.
引用
收藏
页码:292 / 296
页数:5
相关论文
共 50 条
  • [31] Mapping Frames with DNN-HMM Recognizer for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Lu, Yanfeng
    Ehnes, Jochen Walter
    Huang, Dongyan
    Ming, Huaiping
    Tong, Rong
    Lee, Siu Wa
    Li, Haizhou
    [J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 488 - 494
  • [32] Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Nakashika, Toru
    Takiguchi, Tetsuya
    Minami, Yasuhiro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2032 - 2045
  • [33] Non-Parallel Voice Conversion System Using An Auto-Regressive Model
    Ezzine, Kadria
    Frikha, Mondher
    Di Martino, Joseph
    [J]. PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 500 - 504
  • [34] MASKCYCLEGAN-VC: LEARNING NON-PARALLEL VOICE CONVERSION WITH FILLING IN FRAMES
    Kaneko, Takuhiro
    Kameoka, Hirokazu
    Tanaka, Kou
    Hojo, Nobukatsu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5919 - 5923
  • [35] Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. INTERSPEECH 2020, 2020, : 771 - 775
  • [36] Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics
    Liu, Yufei
    Yu, Chengzhu
    Shuai, Wang
    Yang, Zhenchuan
    Chao, Yang
    Zhang, Weibin
    [J]. INTERSPEECH 2021, 2021, : 1369 - 1373
  • [37] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648
  • [38] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648
  • [39] Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data
    Li, Yanping
    Lee, Kong Aik
    Yuan, Yougen
    Li, Haizhou
    Yang, Zhen
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 829 - 833
  • [40] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Yuki Takashima
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2019