PERCEPTUAL CLUSTERING BASED UNIT SELECTION OPTIMIZATION FOR CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS

被引:0
|
作者
Jiang, Tao [1 ,2 ]
Wu, Zhiyong [1 ,2 ]
Jia, Jia [2 ]
Cai, Lianhong [1 ,2 ]
机构
[1] Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Shenzhen 518055, Peoples R China
[2] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol TNList, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Perceptual clustering; decision tree; linear discriminant analyze; cost function; unit selection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In concatenative based speech synthesis, the purpose of unit selection is to select proper speech units from speech corpus by measuring how well the selected units match the given features. Perceptual test indicates that some features are always preferred to make perceptual distinction between units. Such features should be judged prior to others in unit selection. In this work, we attempt to identify the priorities for different features and try to optimize the unit selection with perceptual clustering. Out approach first clusters the speech units with hierarchical clustering based on a perceptual distance measurement between different speech units. A method to identify the questions (concerning the features) is then proposed to build the decision tree from the clustering result. The features used in the decision tree are the preferred ones, and the other features are used in the target cost function. Linear discriminant analysis (LDA) is then adopted to train the weights for the target cost function from the clustering result to make weights more reasonable and perceptual related.. Experimental results indicate that the optimized unit selection can generate synthetic speech with higher naturalness than the previous approach.
引用
收藏
页码:64 / 68
页数:5
相关论文
共 50 条
  • [1] An efficient unit-selection method for concatenative Text-to-speech synthesis systems
    Gros, Jerneja Zganec
    Zganec, Mario
    [J]. Journal of Computing and Information Technology, 2008, 16 (01) : 69 - 78
  • [2] RECENT IMPROVEMENTS OF PROBABILITY BASED PROSODY MODELS FOR UNIT SELECTION IN CONCATENATIVE TEXT-TO-SPEECH
    Zhang, Wei
    Gu, Liang
    Gao, Yuqing
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3777 - 3780
  • [3] Applying Scalable Phonetic Context Similarity in Unit Selection of Concatenative Text-to-Speech
    Zhang, Wei
    Cui, Xiaodong
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 154 - 157
  • [4] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
    Lakkavalli, Vikram Ramesh
    Arulmozhi, P.
    Ramakrishnan, A. G.
    [J]. 2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
  • [5] Affective word ratings for concatenative text-to-speech synthesis
    Tsiakoulis, Pirros
    Raptis, Spiros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    [J]. 20TH PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2016), 2016,
  • [6] A framework for a Bangla concatenative text-to-speech synthesis system
    Syed, MR
    Chakrobartty, S
    Bignall, RJ
    [J]. Innovations Through Information Technology, Vols 1 and 2, 2004, : 1318 - 1320
  • [7] Efficient Unit-Selection in Text-to-Speech Synthesis
    Mihelic, Ales
    Gros, Jerneja Zganec
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
  • [8] Diphone-based unit selection for Catalan text-to-speech synthesis
    Guaus, R
    Iriondo, I
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 277 - 282
  • [9] Articulatory modeling: A possible role in concatenative text-to-speech synthesis
    Sondhi, MM
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 73 - 78
  • [10] A Rule-Based Concatenative Approach to Speech Synthesis in Indian Language Text-to-Speech Systems
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    [J]. INTELLIGENT COMPUTING, COMMUNICATION AND DEVICES, 2015, 309 : 523 - 531