Speaker conversion using kernel non-negative matrix factorization

被引:0
|
作者
Xu Qinyu [1 ,2 ]
Lu Guanming [1 ,2 ]
Yan Jingjie [1 ,2 ]
Li Haibo [1 ,2 ]
Cheng Xiao [1 ,2 ]
机构
[1] College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications
[2] Jiangsu Province Key Laboratory on Image Processing and Image
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Voice conversion(VC) based on Gaussian mixture model(GMM) is the most classic and common method which converts the source spectrum to target spectrum. However this method is prone to over-fitting because of its frame-by-frame conversion. The VC with non-negative matrix factorization(NMF) is presented in this paper, which can keep spectrum from over-fitting by adjusting the size of basis vector(dictionary). In order to realize the non-linear mapping better, kernel NMF(KNMF) is adopted to achieve spectrum mapping. In addition, to increase the accuracy of conversion, KNMF combined with GMM(GKNMF) is also introduced into VC. In the end, KNMF, GKNMF, GMM, principal component regression(PCR), PCR combined with GMM(GPCR), partial least square regression(PLSR), NMF correlation-based frequency warping(NMF-CFW) and deep neural network(DNN) methods are compared with each other. The proposed GKNMF gets better performance in both objective evaluation and subjective evaluation.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Speaker conversion using kernel non-negative matrix factorization
    Xu Qinyu
    Lu Guanming
    Yan Jingjie
    Li Haibo
    Cheng Xiao
    [J]. The Journal of China Universities of Posts and Telecommunications, 2017, (05) : 60 - 67
  • [2] Speaker conversion using kernel non-negative matrix factorization
    [J]. Guanming, Lu (lugm@njupt.edu.cn), 2017, Beijing University of Posts and Telecommunications (24):
  • [3] Fast speaker adaptation using non-negative matrix factorization
    Duchateau, Jacques
    Leroy, Tobias
    Demuynck, Kris
    Van hamme, Hugo
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4269 - 4272
  • [4] Speaker Clustering Based on Non-negative Matrix Factorization
    Nishida, Masafumi
    Yamamoto, Seiichi
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 956 - 959
  • [5] Kernel Non-negative Matrix Factorization Using Self-Constructed Cosine Kernel
    Qian, Huihui
    Chen, Wen-Sheng
    Pan, Binbin
    Chen, Bo
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 186 - 190
  • [6] RAPID SPEAKER ADAPTATION WITH SPEAKER ADAPTIVE TRAINING AND NON-NEGATIVE MATRIX FACTORIZATION
    Zhang, Xueru
    Demuynck, Kris
    Van Hamme, Hugo
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4456 - 4459
  • [7] Rapid speaker adaptation in latent speaker space with non-negative matrix factorization
    Zhang, Xueru
    Demuynck, Kris
    Van Hamme, Hugo
    [J]. SPEECH COMMUNICATION, 2013, 55 (09) : 893 - 908
  • [8] Incremental Kernel Non-negative Matrix Factorization For Hyperspectral Unmixing
    Huang, Risheng
    Li, Xiaorun
    Zhao, Liaoying
    [J]. 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 6569 - 6572
  • [9] Kernel Joint Non-Negative Matrix Factorization for Genomic Data
    Salazar, Diego
    Rios, Juan
    Aceros, Sara
    Florez-Vargas, Oscar
    Valencia, Carlos
    [J]. IEEE ACCESS, 2021, 9 : 101863 - 101875
  • [10] Kernel Non-Negative Matrix Factorization for Seismic Signature Separation
    Mehmood, Asif
    Damarla, Thyagaraju
    [J]. JOURNAL OF PATTERN RECOGNITION RESEARCH, 2013, 8 (01): : 13 - 24