USING SELF ATTENTION DNNS TO DISCOVER PHONEMIC FEATURES FOR AUDIO DEEP FAKE DETECTION

被引:2
|
作者
Dhamyal, Hira [1 ]
Ali, Ayesha [1 ]
Qazi, Ihsan Ayyub [1 ]
Raza, Agha Ali [1 ]
机构
[1] Lahore Univ Management Sci, Lahore, Pakistan
关键词
spoof; bonafide; countermeasure; attention; phonemes; deep neural network; senet; explainable; fair; small datasets; forensics; deepfake; SPEECH;
D O I
10.1109/ASRU51503.2021.9688312
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advancement in natural-sounding speech production models, it is becoming important to develop models that can detect spoofed audios. Synthesized speech models do not explicitly account for all factors affecting speech production, such as the shape, size and structure of a speaker's vocal tract. In this paper, we hypothesize that due to practical limitations of audio corpora (including size, distribution, and balance of variables like gender, age, and accents), there exist certain phonemes that synthesized models are not able to replicate as well as the human articulation system and such phonemes differ in their spectral characteristics from bonafide speech. To discover such phonemes and quantify their effectiveness in distinguishing between spoofed and bonafide speech, we use a deep learning model with self-attention, and analyze the attention weights of the trained model. We use the ASVSpoof2019 dataset for our analysis and find that the attention mechanism picks most on fricatives: /S/,/SH/, nasals: /M/,/N/, vowels: /Y/, and stops: /D/. Furthermore, we obtain 7.54% EER on train and 11.98% on dev data when using only the top-16 most attended phonemes from input audio, better than when any other phoneme classes are used.
引用
收藏
页码:1178 / 1184
页数:7
相关论文
共 50 条
  • [21] Fake speech detection using VGGish with attention block
    Kanwal, Tahira
    Mahum, Rabbia
    Alsalman, Abdul Malik
    Sharaf, Mohamed
    Hassan, Haseeb
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [22] Fake News Detection using Deep Learning
    Kong, Sheng How
    Tan, Li Mei
    Gan, Keng Hoon
    Samsudin, Nur Hana
    IEEE 10TH SYMPOSIUM ON COMPUTER APPLICATIONS AND INDUSTRIAL ELECTRONICS (ISCAIE 2020), 2020, : 102 - 107
  • [23] Detecting Fake Suppliers using Deep Image Features
    Wacker, Jonas
    Ferreira, Rodrigo Peres
    Ladeira, Marcelo
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 224 - 229
  • [24] Fake Image Detection Using Deep Learning
    Khudeyer R.S.
    Al-Moosawi N.M.
    Informatica (Slovenia), 2023, 47 (07): : 115 - 120
  • [25] Fake Audio Detection in Resource-constrained Settings using Microfeatures
    Dhamyal, Hira
    Ali, Ayesha
    Qazi, Ihsan Ayyub
    Raza, Agha Ali
    INTERSPEECH 2021, 2021, : 4149 - 4153
  • [26] End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention
    Tom, Francis
    Jain, Mohit
    Dey, Prasenjit
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 681 - 685
  • [27] Enhancing hierarchical attention networks with CNN and stylistic features for fake news detection
    Alghamdi, Jawaher
    Lin, Yuqing
    Luo, Suhuai
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
  • [28] Fake shadow detection using local HOG features
    Bulla, Aaqib
    Shreedarshan, K.
    2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 1308 - 1314
  • [29] AUDIO TAMPERING DETECTION USING MULTIMODAL FEATURES
    Milani, Simone
    Piazza, Pier Francesco
    Bestagini, Paolo
    Tubaro, Stefano
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [30] Fall Detection Using Smartphone Audio Features
    Cheffena, Michael
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2016, 20 (04) : 1073 - 1080