Transformation of Prosody in Voice Conversion

被引:0
|
作者
Sisman, Berrak [1 ]
Li, Haizhou [1 ]
Tan, Kay Chen [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] City Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Voice Conversion (VC) aims to convert one's voice to sound like that of another. So far, most of the voice conversion frameworks mainly focus only on the conversion of spectrum. We note that speaker identity is also characterized by the prosody features such as fundamental frequency (F0), energy contour and duration. Motivated by this, we propose a framework that can perform F0, energy contour and duration conversion. In the traditional exemplar-based sparse representation approach to voice conversion, a general source-target dictionary of exemplars is constructed to establish the correspondence between source and target speakers. In this work, we propose a Phonetically Aware Sparse Representation of fundamental frequency and energy contour by using Continuous Wavelet Transform (CWT). Our idea is motivated by the facts that CWT decompositions of F0 and energy contours describe prosody patterns in different temporal scales and allow for effective prosody manipulation in speech synthesis. Furthermore, phonetically aware exemplars lead to better estimation of activation matrix, therefore, possibly better conversion of prosody. We also propose a phonetically aware duration conversion framework which takes into account both phone-level and sentence-level speaking rates. We report that the proposed prosody conversion outperforms the traditional prosody conversion techniques in both objective and subjective evaluations.
引用
收藏
页码:1588 / 1597
页数:10
相关论文
共 50 条
  • [1] A novel method for prosody prediction in voice conversion
    Helander, Elina E.
    Nurminen, Jani
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 509 - +
  • [2] Voice conversion by prosody and vocal tract modification
    Rao, K. Sreenivasa
    Yegnanarayana, B.
    [J]. ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 111 - +
  • [3] Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN
    Du, Zongyang
    Zhou, Kun
    Sisman, Barrak
    Li, Haizhou
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 507 - 513
  • [4] Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion
    Sisman, Berrak
    Li, Haizhou
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 52 - 56
  • [5] ON PROSODY MODELING FOR ASR plus TTS BASED VOICE CONVERSION
    Huang, Wen-Chin
    Hayashi, Tomoki
    Li, Xinjian
    Watanabe, Shinji
    Toda, Tomoki
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 642 - 649
  • [6] Voice Conversion Based on Improved GMM and Spectrum with Synchronous Prosody
    Zhang Bing
    Yu Yibiao
    [J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 659 - 662
  • [7] Towards Fine-Grained Prosody Control for Voice Conversion
    Lian, Zheng
    Zhong, Rongxiu
    Wen, Zhengqi
    Liu, Bin
    Tao, Jianhua
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [8] On the transformation of the speech spectrum for voice conversion
    Baudoin, G
    Stylianou, Y
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1405 - 1408
  • [9] Transformation of speaker characteristics for voice conversion
    Rentzos, D
    Vaseghi, S
    Turajlic, E
    Yan, Q
    Ho, CH
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 706 - 711
  • [10] LOCAL LINEAR TRANSFORMATION FOR VOICE CONVERSION
    Popa, Victor
    Silen, Hanna
    Nurminen, Jani
    Gabbouj, Moncef
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4517 - 4520