Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis

被引:4
|
作者
Cuccovillo, Luca [1 ]
Gerhardt, Milica [1 ]
Aichroth, Patrick [1 ]
机构
[1] Fraunhofer Inst Digital Media Technol IDMT, Ehrenbergstr 31, Ilmenau, Germany
关键词
synthetic speech detection; audio deepfakes; spectrogram transformer; voice formants;
D O I
10.1109/WIFS58808.2023.10374615
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we address the challenge of synthetic speech detection, which has become increasingly important due to the latest advancements in text-to-speech and voice conversion technologies. We propose a novel multi-task neural network architecture, designed to be interpretable and specifically tailored for audio signals. The architecture includes a feature bottleneck, used to autoencode the input spectrogram, predict the fundamental frequency (f0) trajectory, and classify the speech as synthetic or natural. Hence, the synthesis detection can be considered a byproduct of attending to the energy distribution among vocal formants, providing a clear understanding of which characteristics of the input signal influence the final outcome. Our evaluation on the ASVspoof 2019 LA partition indicates better performance than the current state of the art, with an AUC score of 0.900.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] AUDIO TRANSFORMER FOR SYNTHETIC SPEECH DETECTION VIA FORMANT MAGNITUDE AND PHASE ANALYSIS
    Cuccovillo, Luca
    Gerhardt, Milica
    Aichroth, Patrick
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4805 - 4809
  • [2] Audio Transformer for Synthetic Speech Detection via Benford's Law Distribution Analysis
    Ashoka, Anitha Bhat Talagini
    Cuccovillo, Luca
    Aichroth, Patrick
    PROCEEDINGS OF THE 3RD ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2024, 2024, : 23 - 29
  • [3] Deepfake Speech Detection: A Spectrogram Analysis
    Firc, Anton
    Malinka, Kamil
    Hanacek, Petr
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1312 - 1320
  • [4] Synthetic Speech Detection through Audio Folding
    Salvi, Davide
    Bestagini, Paolo
    Tubaro, Stefano
    PROCEEDINGS OF THE 2ND ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISCRIMINATION, MAD 2023, 2023, : 3 - 9
  • [5] SINGLE-FORMANT SYNTHETIC SPEECH
    THOMAS, IB
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1968, AU16 (02): : 288 - &
  • [6] Pitch detection and formant analysis of Arabic speech processing
    Cherif, A
    Bouafif, L
    Dabbabi, T
    APPLIED ACOUSTICS, 2001, 62 (10) : 1129 - 1140
  • [7] Detecting Human Emotion via Speech Recognition by Using Speech Spectrogram
    Prasomphan, Sathit
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 113 - 122
  • [8] Synthesized Speech Attribution Using The Patchout Spectrogram Attribution Transformer
    Bhagtani, Kratika
    Bartusiak, Emily R.
    Yadav, Amit Kumar Singh
    Bestagini, Paolo
    Delp, Edward J.
    PROCEEDINGS OF THE 2023 ACM WORKSHOP ON INFORMATION HIDING AND MULTIMEDIA SECURITY, IH&MMSEC 2023, 2023, : 157 - 162
  • [9] SCHEME FOR AUTOMATIC FORMANT ANALYSIS OF SPEECH
    STRONG, WJ
    PURVES, RB
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 51 (01): : 110 - &
  • [10] Fake Speech Detection Using Modulation Spectrogram
    Magazine, Raghav
    Agarwal, Ayush
    Hedge, Anand
    Prasanna, S. R. Mahadeva
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 451 - 463