Voice Conversion Using a Perceptual Criterion

被引:3
|
作者
Lee, Ki-Seung [1 ]
机构
[1] Konkuk Univ, Dept Elect Engn, 1 Hwayang Dong, Seoul 143701, South Korea
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 08期
基金
新加坡国家研究基金会;
关键词
voice conversion; joint conversion; perceptual distance measure;
D O I
10.3390/app10082884
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In voice conversion (VC), it is highly desirable to obtain transformed speech signals that are perceptually close to a target speaker's voice. To this end, a perceptually meaningful criterion where the human auditory system was taken into consideration in measuring the distances between the converted and the target voices was adopted in the proposed VC scheme. The conversion rules for the features associated with the spectral envelope and the pitch modification factor were jointly constructed so that perceptual distance measurement was minimized. This minimization problem was solved using a deep neural network (DNN) framework where input features and target features were derived from source speech signals and time-aligned version of target speech signals, respectively. The validation tests were carried out for the CMU ARCTIC database to evaluate the effectiveness of the proposed method, especially in terms of perceptual quality. The experimental results showed that the proposed method yielded perceptually preferred results compared with independent conversion using conventional mean-square error (MSE) criterion. The maximum improvement in perceptual evaluation of speech quality (PESQ) was 0.312, compared with the conventional VC method.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] REGRESSION APPROACHES TO PERCEPTUAL AGE CONTROL IN SINGING VOICE CONVERSION
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] An Investigation of Acoustic Features for Singing Voice Conversion based on Perceptual Age
    Kobayashi, Kazuhiro
    Doi, Hironori
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1056 - 1060
  • [3] Perceptual Evaluation of Severe Pediatric Voice Disorders: Rater Reliability Using the Consensus Auditory Perceptual Evaluation of Voice
    Kelchner, Lisa N.
    Brehm, Susan B.
    Weinrich, Barbara
    Middendorf, Janet
    deAlarcon, Alessandro
    Levin, Linda
    Elluru, Ravi
    [J]. JOURNAL OF VOICE, 2010, 24 (04) : 441 - 449
  • [4] On a Voice Conversion by using Prosodic Control
    Kim, Jongkuk
    Hong, Min-Cheol
    Hahn, Hernsoo
    [J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND ELECTRONICS INFORMATION (ICACSEI 2013), 2013, 41 : 477 - 481
  • [5] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648
  • [6] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648
  • [7] Robust Voice conversion systems using MFDWC
    Farhid, M.
    Tinati, M. A.
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS, VOLS 1 AND 2, 2008, : 778 - 781
  • [8] Voice Conversion Using Gaussian Mixture Models
    D'souza, Kevin
    Talele, K. T. V.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMMUNICATION, INFORMATION & COMPUTING TECHNOLOGY (ICCICT), 2015,
  • [9] VOICE CONVERSION USING ARTIFICIAL NEURAL NETWORKS
    Desai, Srinivas
    Raghavendra, E. Veera
    Yegnanarayana, B.
    Black, Alan W.
    Prahallad, Kishore
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3893 - +
  • [10] Voice conversion using HMM combined with GMM
    Yue Zhenjun
    Zou Xiang
    Jia Yongxing
    Wang Hao
    [J]. CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 366 - 370