SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS

被引:0
|
作者
Dropulic, Branimir [1 ]
Suflaj, Miljenko [1 ]
Jertec, Andrej [1 ]
Obad, Leo [1 ]
机构
[1] RealNetworks KONTXT, Seattle, WA 98104 USA
关键词
Synthetic speech detection; text-to-speech; wav2vec; 2.0; spoofing attack; multilingualism;
D O I
10.1109/ICASSPW62465.2024.10627750
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Synthetic speech detection plays an important role in fending off ever-increasing malicious use of voice deepfake technologies. However, its robustness and generalization have not yet been explored in diverse language settings. In this paper, we primarily analyze how such a system is affected by: (i) biases caused by different textual domains within human and synthetic samples, (ii) unseen languages, and (iii) non-native speech. Two human speech datasets, FLEURS and ARCTIC (CMU and L2), were extended with generated text-to-speech (TTS) samples. The results indicate that the wav2vec 2.0 based models are agnostic to the aforementioned points.
引用
收藏
页码:585 / 589
页数:5
相关论文
共 50 条
  • [41] Audio Features from the Wav2Vec 2.0 Embeddings for the ACM Multimedia 2022 Stuttering Challenge
    Montacie, Claude
    Caraty, Marie-Jose
    Lackovic, Nikola
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7195 - 7199
  • [42] Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion
    Nasersharif, Babak
    Namvarpour, Mohammad
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (16): : 23667 - 23688
  • [43] Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)
    Ernez, Fares
    Arnold, Alexandre
    Galametz, Audrey
    Kobus, Catherine
    Ould-Amer, Nawal
    CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, VOL 204, 2023, 204 : 16 - 35
  • [44] Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model
    Vyas, Apoorv
    Madikeri, Srikanth
    Bourlard, Herve
    INTERSPEECH 2021, 2021, : 2861 - 2865
  • [45] Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0
    Rouhe, Aku
    Virkkunen, Anja
    Leinonen, Juho
    Kurimo, Mikko
    INTERSPEECH 2022, 2022, : 3543 - 3547
  • [46] CTRL: Continual Representation Learning to Transfer Information of Pre-trained for WAV2VEC 2.0
    Lee, Jae-Hong
    Lee, Chae-Won
    Choi, Jin-Seong
    Chang, Joon-Hyuk
    Seong, Woo Kyeong
    Lee, Jeonghan
    INTERSPEECH 2022, 2022, : 3398 - 3402
  • [47] Keyword spotting for dialectal speech and Introduction of wav2vec2.0
    Ariga, Tomohiro
    Minakawa, Reo
    Kojima, Kazunori
    Lee, Shi-Wook
    Itoh, Yoshiaki
    APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, 2024,
  • [48] Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson's Disease Detection and Speech Features Extraction
    Klempir, Ondrej
    Krupicka, Radim
    SENSORS, 2024, 24 (17)
  • [49] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
    Obiang, Saint germes b. bengono
    Tsopze, Norbert
    Yonta, Paulin melatagia
    Bonastre, Jean-francois
    Jimenez, Tania
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)
  • [50] Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
    Hsu, Wei-Ning
    Sriram, Anuroop
    Baevski, Alexei
    Likhomanenko, Tatiana
    Xu, Qiantong
    Pratap, Vineel
    Kahn, Jacob
    Lee, Ann
    Collobert, Ronan
    Synnaeve, Gabriel
    Auli, Michael
    INTERSPEECH 2021, 2021, : 721 - 725