Frequency Warping for Speaker Adaption of Text-to-speech Synthesis

被引:0
|
作者
Gao, Weixun [1 ,2 ]
Cao, Qiying [3 ]
机构
[1] Donghua Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
[2] Shanghai Normal Univ, Shanghai, Peoples R China
[3] Donghua Univ, Coll Comp Sci & Technol, Shanghai, Peoples R China
来源
关键词
frequency warping; speaker adaptation; TTS;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Vocal tract length normalization (VILN) is generally used in speech recognition for removing Individual speaker characteristics In this paper, we employ VILN to speaker adaptation of speech synthesis We propose a new frequency warping approach to reduce the spectrum distance between source and target speakers The frequency warping function is based on a bilinear function and the warping factor is dynamically generated frame-by-frame The warped spectra of source speaker are then converted to LSPs to train hidden Markov models (HMM) HMMs are further adapted by maximum likelihood linear regression (MLLR) with target speaker's data The experimental results show that our frequency warping approach can make the warped spectra of source speaker closer to target speaker and the resultant adapted HMMs have a better performance than the HMMs trained with unwarped spectra in term of voice naturalness and speaker similarity
引用
收藏
页码:307 / +
页数:2
相关论文
共 50 条
  • [1] SPEAKER INTONATION ADAPTATION FOR TRANSFORMING TEXT-TO-SPEECH SYNTHESIS SPEAKER IDENTITY
    Langarani, Mahsa Sadat Elyasi
    van Santen, Jan
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 116 - 123
  • [2] Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 456 - 460
  • [3] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [4] Learning Speaker Embedding from Text-to-Speech
    Cho, Jaejin
    Zelasko, Piotr
    Villalba, Jesus
    Watanabe, Shinji
    Dehak, Najim
    INTERSPEECH 2020, 2020, : 3256 - 3260
  • [5] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
    Jia, Ye
    Zhang, Yu
    Weiss, Ron J.
    Wang, Quan
    Shen, Jonathan
    Ren, Fei
    Chen, Zhifeng
    Nguyen, Patrick
    Pang, Ruoming
    Moreno, Ignacio Lopez
    Wu, Yonghui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] HEARING FACES: TARGET SPEAKER TEXT-TO-SPEECH SYNTHESIS FROM A FACE
    Pluester, Bjoern
    Weber, Cornelius
    Qu, Leyuan
    Wermter, Stefan
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 757 - 764
  • [7] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
    Xue, Jinlong
    Deng, Yayue
    Han, Yichen
    Li, Ya
    Sun, Jianqing
    Liang, Jiaen
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
  • [8] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Toda, Tomoki
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2995 - 2999
  • [9] Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
    Yamagishi, Junichi
    Nose, Takashi
    Zen, Heiga
    Ling, Zhen-Hua
    Toda, Tomoki
    Tokuda, Keiichi
    King, Simon
    Renals, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1208 - 1230
  • [10] Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
    Gao, Weixun
    Cao, Qiying
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1149 - 1166