New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis

被引:0
|
作者
Martinez-Gonzalez, Beatriz [1 ]
Manuel Pardo, Jose [1 ]
Echeverry-Correa, J. D. [1 ]
Montero, J. M. [1 ]
机构
[1] Univ Politecn Madrid, Grp Tecnol Habla, Ave Complutense S-N, Madrid 28040, Spain
来源
关键词
expressive speech synthesis; speaker diarization; speaking styles; voice building;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Universal use of speech synthesis in different applications would require an easy development of new voices with little manual intervention. Considering the amount of multimedia data available on internet and media, one interesting goal is to develop tools and methods to automatically build multi-style voices from them. In a previous paper a methodology for constructing such tools was sketched, and preliminary experiments with a multi-style database were presented. In this paper we further investigate such approach and propose several improvements to it based on the selection of the appropriate number of initial speakers, the use or not of noise reduction filters, the use of the F0 feature and the use of a music detection algorithm. We have demonstrated that the best system using music detection algorithm decreases the precision error 22.36% relative for the development set and 39.64% relative for the test set compared to the baseline, without degrading the merit factor. The average precision for the test set is 90.62% ranging from 76.18% for reportages to 99.93% for meteorology reports.
引用
收藏
页码:77 / 84
页数:8
相关论文
共 50 条
  • [1] Towards an Unsupervised Speaking Style Voice Building Framework: multi-style speaker diarization
    Lorenzo-Trueba, J.
    Martinez-Gonzalez, B.
    Lopez-Ludena, V.
    Barra-Chicote, R.
    Ferreiros, J.
    Yamagishi, J.
    Montero, J. M.
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2275 - 2278
  • [2] Speaker Diarization Experiments for Romanian Parliamentary Speech
    Lupu, Eugen
    Apatean, Anca
    Arsinte, Radu
    [J]. 2015 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2015,
  • [3] Speaker count: a new building block for speaker diarization
    Duong, Thanh Thi-Hien
    Nguyen, Phi-Le
    Nguyen, Hong-Son
    Nguyen, Duc-Chien
    Phan, Huy
    Duong, Ngoc Q. K.
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1149 - 1155
  • [4] Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis
    Tachibana, Makoto
    Izawa, Shinsuke
    Nose, Takashi
    Kobayashi, Takao
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4633 - 4636
  • [5] Multimodal speech synthesis architecture for unsupervised speaker adaptation
    Hieu-Thi Luong
    Yamagishi, Junichi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498
  • [6] EXPERIMENTS WITH VOICE MODELING IN SPEECH SYNTHESIS
    CARLSON, R
    GRANSTROM, B
    KARLSSON, I
    [J]. SPEECH COMMUNICATION, 1991, 10 (5-6) : 481 - 489
  • [7] Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding
    Secujski, Milan
    Pekar, Darko
    Suzic, Sinisa
    Smirnov, Anton
    Nosek, Tijana
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (04) : 434 - 453
  • [8] Building a speech database for the purpose of speaker specific speech synthesis
    Hoory, R
    Shaked, N
    Chazan, D
    [J]. ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 741 - 744
  • [9] EXPERIMENTS ON UNSUPERVISED STATISTICAL PARAMETRIC SPEECH SYNTHESIS
    Ni, Jinfu
    Shiga, Yoshinori
    Kawai, Hisashi
    Kashioka, Hideki
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 155 - 159
  • [10] DEEPTALK: VOCAL STYLE ENCODING FOR SPEAKER RECOGNITION AND SPEECH SYNTHESIS
    Chowdhury, Anurag
    Ross, Arun
    David, Prabu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6189 - 6193