New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis

被引:0
|
作者
Martinez-Gonzalez, Beatriz [1 ]
Manuel Pardo, Jose [1 ]
Echeverry-Correa, J. D. [1 ]
Montero, J. M. [1 ]
机构
[1] Univ Politecn Madrid, Grp Tecnol Habla, Ave Complutense S-N, Madrid 28040, Spain
来源
关键词
expressive speech synthesis; speaker diarization; speaking styles; voice building;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Universal use of speech synthesis in different applications would require an easy development of new voices with little manual intervention. Considering the amount of multimedia data available on internet and media, one interesting goal is to develop tools and methods to automatically build multi-style voices from them. In a previous paper a methodology for constructing such tools was sketched, and preliminary experiments with a multi-style database were presented. In this paper we further investigate such approach and propose several improvements to it based on the selection of the appropriate number of initial speakers, the use or not of noise reduction filters, the use of the F0 feature and the use of a music detection algorithm. We have demonstrated that the best system using music detection algorithm decreases the precision error 22.36% relative for the development set and 39.64% relative for the test set compared to the baseline, without degrading the merit factor. The average precision for the test set is 90.62% ranging from 76.18% for reportages to 99.93% for meteorology reports.
引用
收藏
页码:77 / 84
页数:8
相关论文
共 50 条
  • [41] SEEN AND UNSEEN EMOTIONAL STYLE TRANSFER FOR VOICE CONVERSION WITH A NEW EMOTIONAL SPEECH DATASET
    Zhou, Kun
    Sisman, Berrak
    Liu, Rui
    Li, Haizhou
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 920 - 924
  • [42] Expressive Text-to-Speech Synthesis using Text Chat Dataset with Speaking Style Information
    Homma, Yukinori
    Kanagawa, Hiroki
    Kobayashi, Nozomi
    Ijima, Yusuke
    Saito, Kuniko
    [J]. Transactions of the Japanese Society for Artificial Intelligence, 2023, 38 (03)
  • [43] Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation
    Obin, Nicolas
    Lanchantin, Pierre
    Lacheret, Anne
    Rodet, Xavier
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2796 - +
  • [44] Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis
    Chomphan, Suphattharachal
    Kobayashi, Takao
    [J]. SPEECH COMMUNICATION, 2009, 51 (04) : 330 - 343
  • [45] A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS
    Liang, Hui
    Dines, John
    Saheer, Lakshmi
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4598 - 4601
  • [46] Building an English Speech Synthesis System from a Japanese ALS Patient's Voice
    Iida, Akemi
    Ito, Jun
    Kajima, Shimpei
    Sugawara, Tsutomu
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1994 - +
  • [47] Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
    Lei, Shun
    Zhou, Yixuan
    Chen, Liyang
    Hu, Jiankun
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    [J]. INTERSPEECH 2022, 2022, : 5523 - 5527
  • [48] A study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis
    Kawai, H
    Tsuzaki, M
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 15 - 18
  • [49] CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
    Meng, Yi
    Li, Xiang
    Wu, Zhiyong
    Li, Tingtian
    Sun, Zixun
    Xiao, Xinyu
    Sun, Chi
    Zhan, Hui
    Meng, Helen
    [J]. INTERSPEECH 2022, 2022, : 5533 - 5537
  • [50] Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis
    Yamagishi, J
    Tachibana, M
    Masuko, T
    Kobayashi, T
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 5 - 8