Quality-enhanced voice morphing using maximum likelihood transformations

被引:63
|
作者
Ye, Hui [1 ]
Young, Steve [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
linear transformation; phase dispersion; voice conversion; voice morphing;
D O I
10.1109/TSA.2005.860839
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice morphing is a technique for modifying a source speaker's speech to sound as if it was spoken by some designated target speaker. The core process in a voice morphing system is the transformation of the spectral envelope of the source speaker to match that of the target speaker and linear transformations estimated from time-aligned parallel training data are commonly used to achieve this. However, the naive application of envelope transformation combined with the necessary pitch and duration modifications will result in noticeable artifacts. This paper studies the linear transformation approach to voice morphing and investigates these two specific issues. First, a general maximum likelihood framework is proposed for transform estimation which avoids the need for parallel training data inherent in conventional least mean square approaches. Second, the main causes of artifacts are identified as being due to glottal coupling, unnatural phase dispersion and the high spectral variance of unvoiced sounds, and compensation techniques are developed to mitigate these. The resulting voice morphing system is evaluated using both subjective and objective measures. These tests show that the proposed approaches are capable of effectively transforming speaker identity whilst maintaining high quality. Furthermore, they do not require carefully prepared parallel training data.
引用
收藏
页码:1301 / 1312
页数:12
相关论文
共 50 条
  • [1] A CODEBOOK COMPENSATIVE VOICE MORPHING ALGORITHM BASED ON MAXIMUM LIKELIHOOD ESTIMATION
    Xu Ning Yang Zhen Zhang Linhua(Institute of Signal Processing and Transmission
    [J]. Journal of Electronics(China), 2009, 26 (03) : 346 - 352
  • [2] High quality voice morphing
    Ye, H
    Young, S
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 9 - 12
  • [3] Morphing one dataset into another with maximum likelihood estimation
    Golling, Tobias
    Klein, Samuel
    Mastandrea, Radha
    Nachman, Benjamin
    Raine, John Andrew
    [J]. PHYSICAL REVIEW D, 2023, 108 (09)
  • [4] Quality-Enhanced OLED Power Savings on Mobile Devices
    Lin, Chun-Han
    Kang, Chih-Kai
    Hsiu, Pi-Cheng
    [J]. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2019, 24 (01)
  • [5] QUALITY-ENHANCED LEGACY PRODUCTS IN THE HERSCHEL SCIENCE ARCHIVE
    Teyssier, D.
    [J]. 6TH ZERMATT ISM-SYMPOSIUM: CONDITIONS AND IMPACT OF STAR FORMATION: FROM LAB TO SPACE: IN MEMORY OF CHARLES H. TOWNES, 2016, 75-76 : 441 - 442
  • [6] Maximum likelihood estimation for a group of physical transformations
    Chiribella, Giulio
    D'Ariano, Giacomo Mauro
    Perinotti, Paolo
    Sacchi, Massimiliano F.
    [J]. INTERNATIONAL JOURNAL OF QUANTUM INFORMATION, 2006, 4 (03) : 453 - 472
  • [7] A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective
    Shankar, Ravi
    Sager, Jacob
    Venkataraman, Archana
    [J]. INTERSPEECH 2019, 2019, : 2848 - 2852
  • [8] TaskMe: Toward a dynamic and quality-enhanced mechanism for mobile crowd sensing
    Guo, Bin
    Chen, Huihui
    Yu, Zhiwen
    Nan, Wenqian
    Xie, Xing
    Zhang, Daqing
    Zhou, Xingshe
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2017, 102 : 14 - 26
  • [9] Enhanced maximum likelihood face recognition
    Jiang, X. D.
    Mandal, B.
    Kot, A.
    [J]. ELECTRONICS LETTERS, 2006, 42 (19) : 1089 - 1091
  • [10] An Accurate Texture Complexity Estimation for Quality-Enhanced and Secure Image Steganography
    Saeed, Ayesha
    Fawad
    Khan, Muhammad Jamil
    Shahid, Humayun
    Naqvi, Syeda Iffat
    Riaz, Muhammad Ali
    Khan, Mansoor Shaukat
    Amin, Yasar
    [J]. IEEE ACCESS, 2020, 8 : 21613 - 21630