SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS

被引:0
|
作者
Dropulic, Branimir [1 ]
Suflaj, Miljenko [1 ]
Jertec, Andrej [1 ]
Obad, Leo [1 ]
机构
[1] RealNetworks KONTXT, Seattle, WA 98104 USA
关键词
Synthetic speech detection; text-to-speech; wav2vec; 2.0; spoofing attack; multilingualism;
D O I
10.1109/ICASSPW62465.2024.10627750
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Synthetic speech detection plays an important role in fending off ever-increasing malicious use of voice deepfake technologies. However, its robustness and generalization have not yet been explored in diverse language settings. In this paper, we primarily analyze how such a system is affected by: (i) biases caused by different textual domains within human and synthetic samples, (ii) unseen languages, and (iii) non-native speech. Two human speech datasets, FLEURS and ARCTIC (CMU and L2), were extended with generated text-to-speech (TTS) samples. The results indicate that the wav2vec 2.0 based models are agnostic to the aforementioned points.
引用
收藏
页码:585 / 589
页数:5
相关论文
共 50 条
  • [1] Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0
    Kunesova, Marie
    Rezackova, Marketa
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 377 - 388
  • [2] Explore Wav2vec 2.0 for Mispronunciation Detection
    Xu, Xiaoshuo
    Kang, Yueteng
    Cao, Songjun
    Lin, Binghuai
    Ma, Long
    INTERSPEECH 2021, 2021, : 4428 - 4432
  • [3] Speech recognition model design for Sundanese language using WAV2VEC 2.0
    Cryssiover A.
    Zahra A.
    International Journal of Speech Technology, 2024, 27 (01) : 171 - 177
  • [4] Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
    Stefanel Gris, Lucas Rafael
    Casanova, Edresson
    de Oliveira, Frederico Santos
    Soares, Anderson da Silva
    Candido Junior, Arnaldo
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 333 - 343
  • [5] Exploring wav2vec 2.0 on speaker verification and language identification
    Fan, Zhiyun
    Li, Meng
    Zhou, Shiyu
    Xu, Bo
    INTERSPEECH 2021, 2021, : 1509 - 1513
  • [6] Comparison of wav2vec 2.0 models on three speech processing tasks
    Kunešová, Marie
    Zajíc, Zbyněk
    Šmídl, Luboš
    Karafiát, Martin
    International Journal of Speech Technology, 2024, 27 (04) : 847 - 859
  • [7] WavFusion: Towards Wav2vec 2.0 Multimodal Speech Emotion Recognition
    Li, Feng
    Luo, Jiusong
    Xia, Wanjun
    MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 325 - 336
  • [8] A Preliminary Study on Wav2Vec 2.0 Embeddings for Text-to-Speech
    Lim, Yohan
    Kim, Namhyeong
    Yun, Seung
    Kim, Hun
    Lee, Seung-Ik
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 343 - 347
  • [9] Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
    Pepino, Leonardo
    Riera, Pablo
    Ferrer, Luciana
    INTERSPEECH 2021, 2021, : 3400 - 3404
  • [10] Learning Music Representations with wav2vec 2.0
    Ragano, Alessandro
    Benetos, Emmanouil
    Hines, Andrew
    2023 31ST IRISH CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE, AICS, 2023,