An Open Dataset of Synthetic Speech

被引:1
|
作者
Yaroshchuk, Artem [1 ]
Papastergiopoulos, Christoforos [2 ]
Cuccovillo, Luca [1 ]
Aichroth, Patrick [1 ]
Votis, Konstantinos [2 ]
Tzovaras, Dimitrios [2 ]
机构
[1] Fraunhofer Inst Digital Media Technol, Ilmenau, Germany
[2] Ctr Res & Technol Hellas, Thessaloniki, Greece
关键词
datasets; neural networks; speech synthesis;
D O I
10.1109/WIFS58808.2023.10374863
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a multilingual, multispeaker dataset composed of synthetic and natural speech, designed to foster research and benchmarking in synthetic speech detection. The dataset encompasses 18,993 audio utterances synthesized from text, alongside with their corresponding natural equivalents, representing approximately 17 hours of synthetic audio data. The dataset features synthetic speech generated by 156 voices spanning three languages, namely, English, German, and Spanish, with a balanced gender representation. It targets state-of-the-art synthesis methods, and has been released with a license allowing seamless extension and redistribution by the research community.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Thinking out loud, an open-access EEG-based BCI dataset for inner speech recognition
    Nieto, Nicolas
    Peterson, Victoria
    Leonardo Rufiner, Hugo
    Esteban Kamienkowski, Juan
    Spies, Ruben
    SCIENTIFIC DATA, 2022, 9 (01)
  • [42] Synthetic Dataset Generation of Driver Telematics
    So, Banghee
    Boucher, Jean-Philippe
    Valdez, Emiliano A.
    RISKS, 2021, 9 (04)
  • [43] Synthetic dataset of ID and Travel Documents
    Boned, Carlos
    Talarmain, Maxime
    Ghanmi, Nabil
    Chiron, Guillaume
    Biswas, Sanket
    Awal, Ahmad Montaser
    Terrades, Oriol Ramos
    SCIENTIFIC DATA, 2024, 11 (01)
  • [44] Arabic paraphrased parallel synthetic dataset
    Al-shameri, Noora
    Al-Khalifa, Hend
    DATA IN BRIEF, 2024, 57
  • [45] StandardSim: A Synthetic Dataset for Retail Environments
    Mata, Cristina
    Locascio, Nick
    Sheikh, Mohammed Azeem
    Kihara, Kenny
    Fischetti, Dan
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 65 - 76
  • [46] A synthetic shadow dataset of agricultural settings
    Huang, Mengchen
    Garcia-Mateos, Gines
    Fernandez-Beltran, Ruben
    DATA IN BRIEF, 2024, 54
  • [47] A synthetic dataset of liver disorder patients
    Nicora, Giovanna
    Buonocore, Tommaso Mario
    Parimbelli, Enea
    DATA IN BRIEF, 2023, 47
  • [48] Generating a synthetic diffusion tensor dataset
    Bergmann, O
    Lundervold, A
    Steihaug, T
    18th IEEE Symposium on Computer-Based Medical Systems, Proceedings, 2005, : 277 - 281
  • [49] A dataset of synthetic art dialogues with ChatGPT
    Gil-Martin, Manuel
    Luna-Jimenez, Cristina
    Esteban-Romero, Sergio
    Estecha-Garitagoitia, Marcos
    Fernandez-Martinez, Fernando
    D'Haro, Luis Fernando
    SCIENTIFIC DATA, 2024, 11 (01)
  • [50] Synthetic Object Recognition Dataset for Industries
    Abou Akar, Chafic
    Tekli, Jimmy
    Jess, Daniel
    Khoury, Mario
    Kamradt, Marc
    Guthe, Michael
    2022 35TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2022), 2022, : 150 - 155