Novel approach of MFCC based alignment and WD-residual modification for voice conversion using RBF

被引:2
|
作者
Nirmal, Jagannath [1 ]
Zaveri, Mukesh [2 ]
Patnaik, Suprava [3 ]
Kachare, Pramod [4 ]
机构
[1] KJ Somaiya Coll Engn, Dept Elect Engn, Bombay 400077, Maharashtra, India
[2] SV Natl Inst Technol, Dept Comp Engn, Surat 395007, India
[3] SV Natl Inst Technol, Dept Elect Engn, Surat 395007, India
[4] Veermata Jeejabai Inst Technol, Dept Elect Engn, Bombay 400031, Maharashtra, India
关键词
Dynamic time warping; Gaussian mixture model; LP-residual; Line spectral frequencies; Mel frequency cepstrum coefficient; Radial basis function; Residual selection method and; Wavelet packet transform; NEURAL-NETWORKS; ALGORITHM; FEATURES; MIXTURE;
D O I
10.1016/j.neucom.2016.07.048
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The voice conversion system modifies the speaker specific characteristics of the source speaker to that of the target speaker, so it perceives like target speaker. The speaker specific characteristics of the speech signal are reflected at different levels such as the shape of the vocal tract, shape of the glottal excitation and long term prosody. The shape of the vocal tract is represented by Line Spectral Frequency (LSF) and the shape of glottal excitation by Linear Predictive (LP) residuals. In this paper, the fourth level wavelet packet transform is applied to LP-residual to generate the sixteen sub-bands. This approach not only reduces the computational complexity but also presents a genuine transformation model over state of the art statistical prediction methods. In voice conversion, the alignment is an essential process which aligns the features of the source and target speakers. In this paper, the Mel Frequency Cepstrum Coefficients (MFCC) based warping path is proposed to align the LSF and LP-residual sub-bands using proposed constant source and constant target alignment. The conventional alignment technique is compared with two proposed approaches namely, constant source and constant target. Analysis shows that, constant source alignment using MFCC warping path performs slightly better than the constant target alignment and the state-of-the-art alignment approach. Generalized mapping models are developed for each sub-band using Radial Basis Function neural network (RBF) and are compared with Gaussian Mixture mapping model (GMM) and residual selection approach. Various subjective and objective evaluation measures indicate significant performance of RBF based residual mapping approach over the state-of-the-art approaches. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:39 / 49
页数:11
相关论文
共 20 条
  • [1] Cepstrum Liftering based Voice Conversion using RBF and GMM
    Nirmal, Jagannath
    Kachare, Pramod
    Patnaik, Suprava
    Zaveri, Mukesh
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2013, : 570 - 575
  • [2] A novel voice conversion approach using admissible wavelet packet decomposition
    Nirmal, Jagannath H.
    Zaveri, Mukesh A.
    Patnaik, Suprava
    Kachare, Pramod H.
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [3] A novel voice conversion approach using admissible wavelet packet decomposition
    Jagannath H Nirmal
    Mukesh A Zaveri
    Suprava Patnaik
    Pramod H Kachare
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [4] Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion
    Liou, Yi-Syuan
    Huang, Wen-Chin
    Yen, Ming-Chi
    Tsai, Shu-Wei
    Peng, Yu-Huai
    Toda, Tomoki
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1234 - 1238
  • [5] Voice Conversion System using SVM for Vocal Tract Modification and Codebook based Model for Pitch Contour Modification
    Laskar, R. H.
    Talukdar, F. A.
    Bhattacharjee, Rajib
    Das, Saugat
    [J]. 2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 2205 - 2210
  • [6] Nonparallel Training of Exemplar-based Voice Conversion System Using INCA-based Alignment Technique
    Suda, Hitoshi
    Kotani, Gaku
    Saito, Daisuke
    [J]. INTERSPEECH 2020, 2020, : 4681 - 4685
  • [7] MODEL-MAPPING BASED VOICE CONVERSION SYSTEM A Novel Approach to Improve Voice Similarity and Naturalness using Model-based Speech Synthesis Techniques
    Li, Baojie
    Wu, Dalei
    Jiang, Hui
    [J]. BIOSIGNALS 2010: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2010, : 442 - 446
  • [8] A Bayesian Approach to Voice Conversion Based on GMMs Using Multiple Model Structures
    Li, Lei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 668 - 671
  • [9] Non-parallel training for voice conversion using background-based alignment of GMMs and INCA algorithm
    Ghorbandoost, Mostafa
    Saba, Valiallah
    [J]. IET SIGNAL PROCESSING, 2017, 11 (08) : 998 - 1005
  • [10] A Novel Approach to Multiple Sequence Alignment Using Multiobjective Evolutionary Algorithm Based on Decomposition
    Zhu, Huazheng
    He, Zhongshi
    Jia, Yuanyuan
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2016, 20 (02) : 717 - 727