Efficient deepfake detection using shallow vision transformer

被引:1
|
作者
Usmani, Shaheen [1 ]
Kumar, Sunil [1 ]
Sadhya, Debanjan [2 ]
机构
[1] ABV Indian Inst Informat Technol & Management, Dept Informat Technol, Gwalior, Madhya Pradesh, India
[2] ABV Indian Inst Informat Technol & Management, Dept Comp Sci & Engn, Gwalior, Madhya Pradesh, India
关键词
Convolutional neural network; Deepfake; Generative adversarial network; Vision transformer; NETWORKS; IMAGES;
D O I
10.1007/s11042-023-15910-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deepfake is a deep learning-based technique that generates fake face images by mimicking the distribution of original images. Deepfake images can be used for malicious intent like creating fake news; hence, it is important to detect them at an early stage. The existing works on deepfake detection mainly focus on appearance-based features and also require substantial computing resources, memory and training data to optimize the model. Since these resources may not be available in many situations, it is important to develop a lightweight model which can work under constrained resources. In this work, we propose a shallow vision transformer for deepfake detection. Our proposed model uses an attention mechanism with a multi-head attention module. The attention mechanism highlights the important sections of deepfake images, whereas the multi-head attention module determines the attention that has to be given to each of the local-level features of an image. Finally, the softmax layer is used to classify an image as real or fake. The proposed model is shallow as it has 16.48 times fewer parameters and approx 2.97 times fewer FLOPS than the baseline vision transformer. Experiments on the Real Fake Face (RFF) and Real and Fake Face Detection (RFFD) datasets show that the model can achieve an accuracy of 92.15% and 88.52% respectively, which are better than many of the existing state-of-the-art models for deepfake detection like GoogleNet, XceptionNet, ResNet50, MesoNet, CNN and baseline vision transformers. Importantly, shallow ViT achieves an accuracy of 90.94% when only half of the RFF dataset is used for training the model, thereby demonstrating its applicability in constrained scenarios.
引用
收藏
页码:12339 / 12362
页数:24
相关论文
共 50 条
  • [1] Efficient deepfake detection using shallow vision transformer
    Shaheen Usmani
    Sunil Kumar
    Debanjan Sadhya
    [J]. Multimedia Tools and Applications, 2024, 83 : 12339 - 12362
  • [2] Deepfake Image Detection using Vision Transformer Models
    Ghita, Bogdan
    Kuzminykh, Ievgeniia
    Usama, Abubakar
    Bakhshi, Taimur
    Marchang, Jims
    [J]. 2024 IEEE INTERNATIONAL BLACK SEA CONFERENCE ON COMMUNICATIONS AND NETWORKING, BLACKSEACOM 2024, 2024, : 332 - 335
  • [3] Improved Deepfake Video Detection Using Convolutional Vision Transformer
    Deressa, Deressa Wodajo
    Lambert, Peter
    Van Wallendael, Glenn
    Atnafu, Solomon
    Mareen, Hannes
    [J]. 2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
  • [4] DeepFake detection algorithm based on improved vision transformer
    Heo, Young-Jin
    Yeo, Woon-Ha
    Kim, Byung-Gyu
    [J]. APPLIED INTELLIGENCE, 2023, 53 (07) : 7512 - 7527
  • [5] DeepFake detection algorithm based on improved vision transformer
    Young-Jin Heo
    Woon-Ha Yeo
    Byung-Gyu Kim
    [J]. Applied Intelligence, 2023, 53 : 7512 - 7527
  • [6] DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer
    Khormali, Aminollah
    Yuan, Jiann-Shiun
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (06):
  • [7] Deepfake Detection Using Spatiotemporal Transformer
    Kaddar, Bachir
    Fezza, Sid Ahmed
    Akhtar, Zahid
    Hamidouche, Wassim
    Hadid, Abdenour
    Serra-Sagristà, Joan
    [J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (11)
  • [8] Unmasking Deception: Empowering Deepfake Detection with Vision Transformer Network
    Arshed, Muhammad Asad
    Alwadain, Ayed
    Ali, Rao Faizan
    Mumtaz, Shahzad
    Ibrahim, Muhammad
    Muneer, Amgad
    [J]. MATHEMATICS, 2023, 11 (17)
  • [9] DeepFake detection with multi-scale convolution and vision transformer
    Lin, Hao
    Huang, Wenmin
    Luo, Weiqi
    Lu, Wei
    [J]. DIGITAL SIGNAL PROCESSING, 2023, 134
  • [10] HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer
    Kaddar, Bachir
    Fezza, Sid Ahmed
    Hamidouche, Wassim
    Akhtar, Zahid
    Hadid, Abdenour
    [J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,