Finding Relevant Features for Statistical Speech Synthesis Adaptation

被引:0
|
作者
Bruneau, Pierrick [1 ]
Parisot, Olivier [1 ]
Mohammadi, Amir [2 ]
Demiroglu, Cenk [2 ]
Ghoniem, Mohammad [1 ]
Tamisier, Thomas [1 ]
机构
[1] Gabriel Lippmann Informat, Ctr Rech Publ, Syst & Collaborat Dept, L-4422 Belvaux, Luxembourg
[2] Ozyegin Univ, Elect & Elect Engn Dept, Istanbul, Turkey
关键词
Speech Synthesis; Speaker Adaptation; Feature Selection; Visual Analytics; SPEAKER ADAPTATION;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Statistical speech synthesis (SSS) models typically lie in a very high-dimensional space. They can be used to allow speech synthesis on digital devices, using only few sentences of input by the user. However, the adaptation algorithms of such weakly trained models suffer from the high dimensionality of the feature space. Because creating new voices is easy with the SSS approach, thousands of voices can be trained and a Nearest-Neighbor (NN) algorithm can be used to obtain better speaker similarity in those limited-data cases. NN methods require good distance measures that correlate well with human perception. This paper investigates the problem of finding good low-cost metrics, i.e. simple functions of feature values that map with objective signal quality metrics. We show this is a ill-posed problem, and study its conversion to a tractable form. Tentative solutions are found using statistical analyzes. With a performance index improved by 36% w.r.t. a naive solution, while using only 0.77% of the respective amount of features, our results are promising. Deeper insights in our results are then unveiled using visual methods, namely high-dimensional data visualization and dimensionality reduction techniques. Perspectives on new adaptation algorithms, and tighter integration of data mining and visualization principles are eventually given.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] VTLN ADAPTATION FOR STATISTICAL SPEECH SYNTHESIS
    Saheer, Lakshmi
    Garner, Philip N.
    Dines, John
    Liang, Hui
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4838 - 4841
  • [2] Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis
    Qader, Raheel
    Lecorve, Gwenole
    Lolive, Damien
    Tahon, Marie
    Sebillot, Pascale
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 92 - 101
  • [3] HotPatch: A statistical a pproach to finding biologically relevant features on protein surfaces
    Pettit, Frank K.
    Bare, Emiko
    Tsai, Albert
    Bowie, James U.
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2007, 369 (03) : 863 - 879
  • [4] Finding relevant features for zero-resource query-by-example search on speech
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Garcia-Mateo, Carmen
    [J]. SPEECH COMMUNICATION, 2016, 84 : 24 - 35
  • [5] Acoustic Features Modelling for Statistical Parametric Speech Synthesis: A Review
    Adiga, Nagaraj
    Prasanna, S. R. M.
    [J]. IETE TECHNICAL REVIEW, 2019, 36 (02) : 130 - 149
  • [6] A style adaptation technique for speech synthesis using HSMM and suprasegmental features
    Tachibana, M
    Yamagishi, J
    Masuko, T
    Kobayashi, T
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1092 - 1099
  • [7] Excitation modelling using epoch features for statistical parametric speech synthesis
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [8] DESIGNING RELEVANT FEATURES FOR VISUAL SPEECH RECOGNITION
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2420 - 2424
  • [9] Speaker Adaptation for Slovak Statistical Parametric Speech Synthesis Based on Hidden Markov Models
    Sulir, Martin
    Juhar, Jozef
    [J]. 2015 25TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2015, : 137 - 140
  • [10] Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data
    Saffjoo, Seyyed Saeed
    Demiroglu, Cenk
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 317 - 321