High quality voice conversion through phoneme-based linear mapping functions with STRAIGHT for mandarin

被引:48
|
作者
Liu, Kun [1 ]
Zhang, Jianping [1 ]
Yan, Yonghong [1 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Beijing 100083, Peoples R China
关键词
voice conversion; formant transitions; main vowel; phoneme-based mapping functions;
D O I
10.1109/FSKD.2007.347
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel voice conversion system using, phoneme-based linear mapping functions on main vowel phonemes is proposed in this paper. Our voice conversion algorithm has the, following three improvements. First, instead of has no all the Vocal Tract Resonance (VTR) vectors in the portion of a phoneme, we use the VTR vector at the steady-state of each phoneme to train phoneme-based GMM. Second, different linear mapping functions have been trained to describe the mapping relationships for corresponding phonemes. Third, in the transformation procedure. the transformed formant frequencies at the main vowel phonemes are obtained using the corresponding GMM. Besides, prosody parameters are also transformed. Finally the converted speech is re-synthesized with the transformed parameters by high quality speech manipulation framework STRAIGHT (Speech Transformation and Representation based on Adaptive Interpolation of weiGHTed spectrogram). Perceptual results for F-M and M-F conversion show that our MOS score of the converted voice is improved from 3.8 to 4.1 and ABX score front 3.3 to 3.8 compared with IBM's system. Comparisons with other systems are also given in this paper.
引用
收藏
页码:410 / 414
页数:5
相关论文
共 12 条
  • [1] Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck
    Lee, Sang-Hoon
    Noh, Hyeong-Rae
    Nam, Woo-Jeoung
    Lee, Seong-Whan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1173 - 1183
  • [2] Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model
    Nguyen, Binh Phu
    Akagi, Masato
    [J]. 2008 SECOND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, 2008, : 222 - 227
  • [3] PHONEME CLUSTER BASED STATE MAPPING FOR TEXT-INDEPENDENT VOICE CONVERSION
    Zhang, Meng
    Tao, Jiaohua
    Nurminen, Jani
    Tian, Jilei
    Wang, Xia
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4281 - +
  • [4] A ANN BASED HIGH QUALITY METHOD FOR VOICE CONVERSION
    Chen, Z.
    Zhang, L. H.
    [J]. 2010 6TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS NETWORKING AND MOBILE COMPUTING (WICOM), 2010,
  • [5] High Quality Voice Conversion based on ISODATA Clustering Algorithm
    Li, Yanping
    Zuo, Yutao
    Yang, Zhen
    Shao, Xi
    [J]. 2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), 2017,
  • [6] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
    Chen, Kuan
    Chen, Bo
    Lai, Jiahao
    Yu, Kai
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997
  • [7] HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK
    Fang, Fuming
    Yamagishi, Junichi
    Echizen, Isao
    Lorenzo-Trueba, Jaime
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5279 - 5283
  • [8] A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model
    Suda, Hitoshi
    Kotani, Gaku
    Takamichi, Shinnosuke
    Saito, Daisuke
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 816 - 822
  • [9] Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations
    Zorila, Tudor-Catalin
    Erro, Daniel
    Hernaez, Inma
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 30 - 39
  • [10] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    [J]. The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 68 - 75