New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis

被引：0

作者：

Martinez-Gonzalez, Beatriz ^{[1
]}

Manuel Pardo, Jose ^{[1
]}

Echeverry-Correa, J. D. ^{[1
]}

Montero, J. M. ^{[1
]}

机构：

[1] Univ Politecn Madrid, Grp Tecnol Habla, Ave Complutense S-N, Madrid 28040, Spain

来源：

PROCESAMIENTO DEL LENGUAJE NATURAL | 2014年 / 52期

关键词：

expressive speech synthesis; speaker diarization; speaking styles; voice building;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Universal use of speech synthesis in different applications would require an easy development of new voices with little manual intervention. Considering the amount of multimedia data available on internet and media, one interesting goal is to develop tools and methods to automatically build multi-style voices from them. In a previous paper a methodology for constructing such tools was sketched, and preliminary experiments with a multi-style database were presented. In this paper we further investigate such approach and propose several improvements to it based on the selection of the appropriate number of initial speakers, the use or not of noise reduction filters, the use of the F0 feature and the use of a music detection algorithm. We have demonstrated that the best system using music detection algorithm decreases the precision error 22.36% relative for the development set and 39.64% relative for the test set compared to the baseline, without degrading the merit factor. The average precision for the test set is 90.62% ranging from 76.18% for reportages to 99.93% for meteorology reports.

引用

页码：77 / 84

页数：8

共 50 条

[41] SEEN AND UNSEEN EMOTIONAL STYLE TRANSFER FOR VOICE CONVERSION WITH A NEW EMOTIONAL SPEECH DATASET
Zhou, Kun
Sisman, Berrak
Liu, Rui
Li, Haizhou
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 920 - 924
[42] Expressive Text-to-Speech Synthesis using Text Chat Dataset with Speaking Style Information
Homma, Yukinori
Kanagawa, Hiroki
Kobayashi, Nozomi
Ijima, Yusuke
Saito, Kuniko
[J]. Transactions of the Japanese Society for Artificial Intelligence, 2023, 38 (03)
[43] Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation
Obin, Nicolas
Lanchantin, Pierre
Lacheret, Anne
Rodet, Xavier
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2796 - +
[44] Tone correctness improvement in speaker-independent average-voice-based Thai speech synthesis
Chomphan, Suphattharachal
Kobayashi, Takao
[J]. SPEECH COMMUNICATION, 2009, 51 (04) : 330 - 343
[45] A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS
Liang, Hui
Dines, John
Saheer, Lakshmi
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4598 - 4601
[46] Building an English Speech Synthesis System from a Japanese ALS Patient's Voice
Iida, Akemi
Ito, Jun
Kajima, Shimpei
Sugawara, Tsutomu
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1994 - +
[47] Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Lei, Shun
Zhou, Yixuan
Chen, Liyang
Hu, Jiankun
Wu, Zhiyong
Kang, Shiyin
Meng, Helen
[J]. INTERSPEECH 2022, 2022, : 5523 - 5527
[48] A study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis
Kawai, H
Tsuzaki, M
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 15 - 18
[49] CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Meng, Yi
Li, Xiang
Wu, Zhiyong
Li, Tingtian
Sun, Zixun
Xiao, Xinyu
Sun, Chi
Zhan, Hui
Meng, Helen
[J]. INTERSPEECH 2022, 2022, : 5533 - 5537
[50] Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis
Yamagishi, J
Tachibana, M
Masuko, T
Kobayashi, T
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 5 - 8

← 1 2 3 4 5 →