Statistical Singing Voice Conversion based on Direct Waveform Modification with Global Variance

被引:0
|
作者
Kobayashi, Kazuhiro [1 ]
Toda, Tomoki [1 ]
Neubig, Graham [1 ]
Sakti, Sakriani [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol NAIST, Grad Sch Informat Sci, Nara, Japan
关键词
statistical singing voice conversion; direct wave-form modification; spectral differential; global variance; Gaussian mixture model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents techniques to improve the quality of voices generated through statistical singing voice conversion with direct waveform modification based on spectrum differential (DIFFSVC). The DIFFSVC method makes it possible to convert singing voice characteristics of a source singer into those of a target singer without using vocoder-based waveform generation. However, quality of the converted singing voice still degrades compared to that of a natural singing voice due to various factors, such as the over-smoothing of the converted spectral parameter trajectory. To alleviate this over-smoothing, we propose a technique to restore the global variance of the converted spectral parameter trajectory within the framework of the DIFFSVC method. We also propose another technique to specifically avoid over-smoothing at unvoiced frames. Results of subjective and objective evaluations demonstrate that the proposed techniques significantly improve speech quality of the converted singing voice while preserving the conversion accuracy of singer identity compared to the conventional DIFFSVC.
引用
收藏
页码:2754 / 2758
页数:5
相关论文
共 47 条
  • [1] Robustness of Statistical Voice Conversion based on Direct Waveform Modification against Background Sounds
    Kurita, Yusuke
    Kobayashi, Kazuhiro
    Takeda, Kazuya
    Toda, Tomoki
    [J]. INTERSPEECH 2019, 2019, : 684 - 688
  • [2] Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Nakamura, Satoshi
    [J]. SPEECH COMMUNICATION, 2018, 99 : 211 - 220
  • [3] Gender-dependent Spectrum Differential Models for Perceived Age Control based on Direct Waveform Modification in Singing Voice Conversion
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [4] IMPLEMENTATION OF F0 TRANSFORMATION FOR STATISTICAL SINGING VOICE CONVERSION BASED ON DIRECTWAVEFORM MODIFICATION
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Nakamura, Satoshi
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5670 - 5674
  • [5] IMPROVING ADVERSARIAL WAVEFORM GENERATION BASED SINGING VOICE CONVERSION WITH HARMONIC SIGNALS
    Guo, Haohan
    Zhou, Zhiping
    Meng, Fanbo
    Liu, Kai
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6657 - 6661
  • [6] Statistical voice conversion with WaveNet-based waveform generation
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Tamamori, Akira
    Toda, Tomoki
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1138 - 1142
  • [7] Voice Conversion Based on State Space Model and Considering Global Variance
    Ahangar, Mohsen
    Ghorbandoost, Mostafa
    Sheikhzadeh, Hamid
    Raahemifar, Kaamran
    Shahrebabaki, Abdoreza Sabzi
    Amini, Jamal
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013), 2013, : 416 - 421
  • [8] Singing Voice Conversion Using Posted Waveform Data on Music Social Media
    Senda, Koki
    Hono, Yukiya
    Sawada, Kei
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1913 - 1917
  • [9] MODULAR GLOBAL VARIANCE ENHANCEMENT FOR VOICE CONVERSION SYSTEMS
    Benisty, H.
    Malah, D.
    Crammer, K.
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 370 - 374
  • [10] Voice Conversion using GMAT with Enhanced Global Variance
    Benisty, Hadas
    Malah, David
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 676 - 679