New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis

被引：0

作者：

Martinez-Gonzalez, Beatriz ^{[1
]}

Manuel Pardo, Jose ^{[1
]}

Echeverry-Correa, J. D. ^{[1
]}

Montero, J. M. ^{[1
]}

机构：

[1] Univ Politecn Madrid, Grp Tecnol Habla, Ave Complutense S-N, Madrid 28040, Spain

来源：

PROCESAMIENTO DEL LENGUAJE NATURAL | 2014年 / 52期

关键词：

expressive speech synthesis; speaker diarization; speaking styles; voice building;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Universal use of speech synthesis in different applications would require an easy development of new voices with little manual intervention. Considering the amount of multimedia data available on internet and media, one interesting goal is to develop tools and methods to automatically build multi-style voices from them. In a previous paper a methodology for constructing such tools was sketched, and preliminary experiments with a multi-style database were presented. In this paper we further investigate such approach and propose several improvements to it based on the selection of the appropriate number of initial speakers, the use or not of noise reduction filters, the use of the F0 feature and the use of a music detection algorithm. We have demonstrated that the best system using music detection algorithm decreases the precision error 22.36% relative for the development set and 39.64% relative for the test set compared to the baseline, without degrading the merit factor. The average precision for the test set is 90.62% ranging from 76.18% for reportages to 99.93% for meteorology reports.

引用

页码：77 / 84

页数：8

共 50 条

[1] Towards an Unsupervised Speaking Style Voice Building Framework: multi-style speaker diarization
Lorenzo-Trueba, J.
Martinez-Gonzalez, B.
Lopez-Ludena, V.
Barra-Chicote, R.
Ferreiros, J.
Yamagishi, J.
Montero, J. M.
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2275 - 2278
[2] Speaker Diarization Experiments for Romanian Parliamentary Speech
Lupu, Eugen
Apatean, Anca
Arsinte, Radu
[J]. 2015 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2015,
[3] Speaker count: a new building block for speaker diarization
Duong, Thanh Thi-Hien
Nguyen, Phi-Le
Nguyen, Hong-Son
Nguyen, Duc-Chien
Phan, Huy
Duong, Ngoc Q. K.
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1149 - 1155
[4] Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis
Tachibana, Makoto
Izawa, Shinsuke
Nose, Takashi
Kobayashi, Takao
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4633 - 4636
[5] Multimodal speech synthesis architecture for unsupervised speaker adaptation
Hieu-Thi Luong
Yamagishi, Junichi
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498
[6] EXPERIMENTS WITH VOICE MODELING IN SPEECH SYNTHESIS
CARLSON, R
GRANSTROM, B
KARLSSON, I
[J]. SPEECH COMMUNICATION, 1991, 10 (5-6) : 481 - 489
[7] Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding
Secujski, Milan
Pekar, Darko
Suzic, Sinisa
Smirnov, Anton
Nosek, Tijana
[J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (04) : 434 - 453
[8] Building a speech database for the purpose of speaker specific speech synthesis
Hoory, R
Shaked, N
Chazan, D
[J]. ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 741 - 744
[9] EXPERIMENTS ON UNSUPERVISED STATISTICAL PARAMETRIC SPEECH SYNTHESIS
Ni, Jinfu
Shiga, Yoshinori
Kawai, Hisashi
Kashioka, Hideki
[J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 155 - 159
[10] DEEPTALK: VOCAL STYLE ENCODING FOR SPEAKER RECOGNITION AND SPEECH SYNTHESIS
Chowdhury, Anurag
Ross, Arun
David, Prabu
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6189 - 6193

← 1 2 3 4 5 →