Enhancing Voice Cloning Quality through Data Selection and Alignment-Based Metrics

被引:0
|
作者
Gonzalez-Docasal, Ander [1 ,2 ]
Alvarez, Aitor [1 ]
机构
[1] Basque Res & Technol Alliance BRTA, Fdn Vicomtech, Donostia San Sebastian 20009, Spain
[2] Univ Zaragoza, Dept Elect Engn & Commun, Zaragoza 50009, Spain
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期
关键词
voice cloning; speech synthesis; speech quality evaluation; CORPUS;
D O I
10.3390/app13148049
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Voice cloning, an emerging field in the speech-processing area, aims to generate synthetic utterances that closely resemble the voices of specific individuals. In this study, we investigated the impact of various techniques on improving the quality of voice cloning, specifically focusing on a low-quality dataset. To contrast our findings, we also used two high-quality corpora for comparative analysis. We conducted exhaustive evaluations of the quality of the gathered corpora in order to select the most-suitable data for the training of a voice-cloning system. Following these measurements, we conducted a series of ablations by removing audio files with a lower signal-to-noise ratio and higher variability in utterance speed from the corpora in order to decrease their heterogeneity. Furthermore, we introduced a novel algorithm that calculates the fraction of aligned input characters by exploiting the attention matrix of the Tacotron 2 text-to-speech system. This algorithm provides a valuable metric for evaluating the alignment quality during the voice-cloning process. We present the results of our experiments, demonstrating that the performed ablations significantly increased the quality of synthesised audio for the challenging low-quality corpus. Notably, our findings indicated that models trained on a 3 h corpus from a pre-trained model exhibit comparable audio quality to models trained from scratch using significantly larger amounts of data.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Alignment-Based Metrics for Trace Comparison
    Weber, Matthias
    Mohror, Kathryn
    Schulz, Martin
    de Supinski, Bronis R.
    Brunst, Holger
    Nagel, Wolfgang E.
    [J]. EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 29 - 40
  • [2] Alignment-based approximate SPARQL querying on linked open data
    Liu, Yu
    Chen, Lei
    Chen, Shihong
    [J]. Computer Modelling and New Technologies, 2014, 18 (11): : 296 - 303
  • [3] Time Series Data Quality Enhancing Based on Pattern Alignment
    Huang, Jianping
    Chen, Hao
    Wang, Hongkai
    Feng, Jun
    Peng, Liangying
    Liang, Zheng
    Wang, Hongzhi
    Fan, Tianlan
    Yu, Tianren
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS. DASFAA 2022 INTERNATIONAL WORKSHOPS, 2022, 13248 : 363 - 375
  • [4] Alignment-based approach for durable data storage into living organisms
    Yachie, Nozomu
    Sekiyama, Kazuhide
    Sugahara, Junichi
    Ohashi, Yoshiaki
    Tomita, Masaru
    [J]. BIOTECHNOLOGY PROGRESS, 2007, 23 (02) : 501 - 505
  • [5] Development of Alignment-Based Parametric Data Exchange Schema for Bridge Geometry
    Hu, Hanjin
    Chen, Stuart S.
    Srikonda, Rohit
    Ali, Najaf
    [J]. TRANSPORTATION RESEARCH RECORD, 2014, (2460) : 22 - 30
  • [6] Business Alignment-Based Data Warehousing Physical Design Driven by Models
    Simonin, Jacques
    Bigaret, Sebastien
    [J]. 2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2013,
  • [7] Patchwork: Alignment-Based Retrieval and Concatenation of Phylogenetic Markers from Genomic Data
    Thalen, Felix
    Koehne, Clara G.
    Bleidorn, Christoph
    [J]. GENOME BIOLOGY AND EVOLUTION, 2023, 15 (12):
  • [8] Decomposing Alignment-Based Conformance Checking of Data-Aware Process Models
    de Leoni, Massimiliano
    Munoz-Gama, Jorge
    Carmona, Josep
    van der Aalst, Wil M. P.
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2014 CONFERENCES, 2014, 8841 : 3 - 20
  • [9] Alignment-based profiling of Europarl data in an English-Swedish parallel corpus
    Ahrenberg, Lars
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3398 - 3404
  • [10] An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation
    He, Xiangheng
    Chen, Junjie
    Rizos, Georgios
    Schuller, Bjorn W.
    [J]. INTERSPEECH 2021, 2021, : 821 - 825