Finding Relevant Features for Statistical Speech Synthesis Adaptation

被引:0
|
作者
Bruneau, Pierrick [1 ]
Parisot, Olivier [1 ]
Mohammadi, Amir [2 ]
Demiroglu, Cenk [2 ]
Ghoniem, Mohammad [1 ]
Tamisier, Thomas [1 ]
机构
[1] Gabriel Lippmann Informat, Ctr Rech Publ, Syst & Collaborat Dept, L-4422 Belvaux, Luxembourg
[2] Ozyegin Univ, Elect & Elect Engn Dept, Istanbul, Turkey
关键词
Speech Synthesis; Speaker Adaptation; Feature Selection; Visual Analytics; SPEAKER ADAPTATION;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Statistical speech synthesis (SSS) models typically lie in a very high-dimensional space. They can be used to allow speech synthesis on digital devices, using only few sentences of input by the user. However, the adaptation algorithms of such weakly trained models suffer from the high dimensionality of the feature space. Because creating new voices is easy with the SSS approach, thousands of voices can be trained and a Nearest-Neighbor (NN) algorithm can be used to obtain better speaker similarity in those limited-data cases. NN methods require good distance measures that correlate well with human perception. This paper investigates the problem of finding good low-cost metrics, i.e. simple functions of feature values that map with objective signal quality metrics. We show this is a ill-posed problem, and study its conversion to a tractable form. Tentative solutions are found using statistical analyzes. With a performance index improved by 36% w.r.t. a naive solution, while using only 0.77% of the respective amount of features, our results are promising. Deeper insights in our results are then unveiled using visual methods, namely high-dimensional data visualization and dimensionality reduction techniques. Perspectives on new adaptation algorithms, and tighter integration of data mining and visualization principles are eventually given.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] ON THE USEFULNESS OF STATISTICAL NORMALISATION OF BOTTLENECK FEATURES FOR SPEECH RECOGNITION
    Loweimi, Erfan
    Bell, Peter
    Renals, Steve
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3862 - 3866
  • [42] Statistical estimation of unreliable features for robust speech recognition
    Renevey, P
    Drygajlo, A
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1731 - 1734
  • [43] MULTILINGUAL SPEECH DATA-BASE FOR SPEECH QUALITY EVALUATION AND STATISTICAL FEATURES
    IRII, H
    ITOH, K
    KITAWAKI, N
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (01): : 33 - 41
  • [44] Finding Feature Relationships and Relevant Features in Large Datasets using FPGAs
    Porcello, John C.
    [J]. 2023 IEEE AEROSPACE CONFERENCE, 2023,
  • [45] ARTICULATORY FEATURES FOR EXPRESSIVE SPEECH SYNTHESIS
    Black, Alan W.
    Bunnell, H. Timothy
    Dou, Ying
    Muthukumar, Prasanna Kumar
    Metze, Florian
    Perry, Daniel
    Polzehl, Tim
    Prahallad, Kishore
    Steidl, Stefan
    Vaughn, Callie
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4005 - 4008
  • [46] On the Contribution of Articulatory Features to Speech Synthesis
    Matura, Martin
    Juzova, Marketa
    Matousek, Jindrich
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 398 - 407
  • [47] Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis
    Ijima, Yusuke
    Asami, Taichi
    Mizuno, Hideyuki
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 337 - 341
  • [48] Study on the Use and Adaptation of Bottleneck Features for Robust Speech Recognition of Nonlinearly Distorted Speech
    Malek, Jiri
    Cerva, Petr
    Seps, Ladislav
    Nouza, Jan
    [J]. SIGMAP: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS - VOL. 5, 2016, : 65 - 71
  • [49] DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS
    Ozturk, Mirac Goksu
    Ulusoy, Okan
    Demiroglu, Cenk
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7030 - 7034
  • [50] Hybrid Nearest-Neighbor/Cluster Adaptive Training for Rapid Speaker Adaptation in Statistical Speech Synthesis Systems
    Mohammadi, Amir
    Demiroglu, Cenk
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1076 - 1080