SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS

被引:0
|
作者
Dropulic, Branimir [1 ]
Suflaj, Miljenko [1 ]
Jertec, Andrej [1 ]
Obad, Leo [1 ]
机构
[1] RealNetworks KONTXT, Seattle, WA 98104 USA
关键词
Synthetic speech detection; text-to-speech; wav2vec; 2.0; spoofing attack; multilingualism;
D O I
10.1109/ICASSPW62465.2024.10627750
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Synthetic speech detection plays an important role in fending off ever-increasing malicious use of voice deepfake technologies. However, its robustness and generalization have not yet been explored in diverse language settings. In this paper, we primarily analyze how such a system is affected by: (i) biases caused by different textual domains within human and synthetic samples, (ii) unseen languages, and (iii) non-native speech. Two human speech datasets, FLEURS and ARCTIC (CMU and L2), were extended with generated text-to-speech (TTS) samples. The results indicate that the wav2vec 2.0 based models are agnostic to the aforementioned points.
引用
收藏
页码:585 / 589
页数:5
相关论文
共 50 条
  • [21] Evaluation of Wav2Vec Speech Recognition for Speakers with Cognitive Disorders
    Svec, Jan
    Polak, Filip
    Bartos, Ales
    Zapletalova, Michaela
    Vita, Martin
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 501 - 512
  • [22] wav2vec: Unsupervised Pre-training for Speech Recognition
    Schneider, Steffen
    Baevski, Alexei
    Collobert, Ronan
    Auli, Michael
    INTERSPEECH 2019, 2019, : 3465 - 3469
  • [23] Aggregation Strategies of Wav2vec 2.0 Embeddings for Computational Paralinguistic Tasks
    Vetrab, Mercedes
    Gosztolya, Gabor
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 79 - 93
  • [24] End to End Spoken Language Diarization with Wav2vec Embeddings
    Mishra, Jagabandhu
    Patil, Jayadev N.
    Chowdhury, Amartya
    Prasanna, S. R. Mahadeva
    INTERSPEECH 2023, 2023, : 501 - 505
  • [25] W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition
    Kim, Dong-Hyun
    Lee, Jae-Hong
    Mo, Ji-Hwan
    Chang, Joon-Hyuk
    INTERSPEECH 2022, 2022, : 3038 - 3042
  • [26] MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0
    Sharma, Mayank
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6907 - 6911
  • [27] Crossing language identification: Multilingual ASR framework based on semantic dataset creation & Wav2Vec 2.0
    Anidjar, Or Haim
    Yozevitch, Roi
    Bigon, Nerya
    Abdalla, Najeeb
    Myara, Benjamin
    Marbel, Revital
    MACHINE LEARNING WITH APPLICATIONS, 2023, 13
  • [28] Investigating the Utility of wav2vec 2.0 Hidden Layers for Detecting Multiple Sclerosis
    Gosztolya, Gabor
    Toth, Laszlo
    Svindt, Veronika
    Bona, Judit
    Hoffmann, Ildiko
    SPEECH AND COMPUTER, SPECOM 2024, PT I, 2025, 15299 : 297 - 308
  • [29] Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction
    Becerra, Helard
    Ragano, Alessandro
    Hines, Andrew
    INTERSPEECH 2022, 2022, : 4088 - 4092
  • [30] Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation
    Fukuda, Ryo
    Sudoh, Katsuhito
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 906 - 916