USING SELF ATTENTION DNNS TO DISCOVER PHONEMIC FEATURES FOR AUDIO DEEP FAKE DETECTION

被引:2
|
作者
Dhamyal, Hira [1 ]
Ali, Ayesha [1 ]
Qazi, Ihsan Ayyub [1 ]
Raza, Agha Ali [1 ]
机构
[1] Lahore Univ Management Sci, Lahore, Pakistan
关键词
spoof; bonafide; countermeasure; attention; phonemes; deep neural network; senet; explainable; fair; small datasets; forensics; deepfake; SPEECH;
D O I
10.1109/ASRU51503.2021.9688312
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advancement in natural-sounding speech production models, it is becoming important to develop models that can detect spoofed audios. Synthesized speech models do not explicitly account for all factors affecting speech production, such as the shape, size and structure of a speaker's vocal tract. In this paper, we hypothesize that due to practical limitations of audio corpora (including size, distribution, and balance of variables like gender, age, and accents), there exist certain phonemes that synthesized models are not able to replicate as well as the human articulation system and such phonemes differ in their spectral characteristics from bonafide speech. To discover such phonemes and quantify their effectiveness in distinguishing between spoofed and bonafide speech, we use a deep learning model with self-attention, and analyze the attention weights of the trained model. We use the ASVSpoof2019 dataset for our analysis and find that the attention mechanism picks most on fricatives: /S/,/SH/, nasals: /M/,/N/, vowels: /Y/, and stops: /D/. Furthermore, we obtain 7.54% EER on train and 11.98% on dev data when using only the top-16 most attended phonemes from input audio, better than when any other phoneme classes are used.
引用
收藏
页码:1178 / 1184
页数:7
相关论文
共 50 条
  • [1] PARTIALLY FAKE AUDIO DETECTION BY SELF-ATTENTION-BASED FAKE SPAN DISCOVERY
    Wu, Haibin
    Kuo, Heng-Cheng
    Zheng, Naijun
    Hung, Kuo-Hsuan
    Lee, Hung-yi
    Tsao, Yu
    Wang, Hsin-Min
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9236 - 9240
  • [2] Audio Deep Fake Detection with Sonic Sleuth Model
    Alshehri, Anfal
    Almalki, Danah
    Alharbi, Eaman
    Albaradei, Somayah
    COMPUTERS, 2024, 13 (10)
  • [3] Convergence of Deep Learning and Forensic Methodologies Using Self-attention Integrated EfficientNet Model for Deep Fake Detection
    Rimjhim Padam Singh
    Nichenametla Hima Sree
    Koti Leela Sai Praneeth Reddy
    Kandukuri Jashwanth
    SN Computer Science, 5 (8)
  • [4] Detecting Fake Audio of Arabic Speakers Using Self-Supervised Deep Learning
    Almutairi, Zaynab M.
    Elgibreen, Hebah
    IEEE ACCESS, 2023, 11 : 72134 - 72147
  • [5] Generalized Fake Audio Detection via Deep Stable Learning
    Wang, Zhiyong
    Fu, Ruibo
    Wen, Zhengqi
    Xie, Yuankun
    Liu, Yukun
    Wang, Xiaopeng
    Liu, Xuefei
    Li, Yongwei
    Tao, Jianhua
    Qi, Xin
    Lu, Yi
    Shi, Shuchen
    INTERSPEECH 2024, 2024, : 4773 - 4777
  • [6] Learning Contextual Features with Multi-head Self-attention for Fake News Detection
    Wang, Yangqian
    Han, Hao
    Ding, Ye
    Wang, Xuan
    Liao, Qing
    COGNITIVE COMPUTING - ICCC 2019, 2019, 11518 : 132 - 142
  • [7] Detecting Forged Audio Files Using "Mixed Paste" Command: A Deep Learning Approach Based on Korean Phonemic Features
    Son, Yeongmin
    Park, Jae Wan
    SENSORS, 2024, 24 (06)
  • [8] Fake news detection and classification using hybrid BiLSTM and self-attention model
    Mohapatra, Asutosh
    Thota, Nithin
    Prakasam, P.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (13) : 18503 - 18519
  • [9] Fake news detection and classification using hybrid BiLSTM and self-attention model
    Asutosh Mohapatra
    Nithin Thota
    P. Prakasam
    Multimedia Tools and Applications, 2022, 81 : 18503 - 18519
  • [10] AENeT: an attention-enabled neural architecture for fake news detection using contextual features
    Jain, Vidit
    Kaliyar, Rohit Kumar
    Goswami, Anurag
    Narang, Pratik
    Sharma, Yashvardhan
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (01): : 771 - 782