Efficient deepfake detection using shallow vision transformer

被引：1

作者：

Usmani, Shaheen ^{[1
]}

Kumar, Sunil ^{[1
]}

Sadhya, Debanjan ^{[2
]}

机构：

[1] ABV Indian Inst Informat Technol & Management, Dept Informat Technol, Gwalior, Madhya Pradesh, India

[2] ABV Indian Inst Informat Technol & Management, Dept Comp Sci & Engn, Gwalior, Madhya Pradesh, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 04期

关键词：

Convolutional neural network; Deepfake; Generative adversarial network; Vision transformer; NETWORKS; IMAGES;

D O I：

10.1007/s11042-023-15910-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deepfake is a deep learning-based technique that generates fake face images by mimicking the distribution of original images. Deepfake images can be used for malicious intent like creating fake news; hence, it is important to detect them at an early stage. The existing works on deepfake detection mainly focus on appearance-based features and also require substantial computing resources, memory and training data to optimize the model. Since these resources may not be available in many situations, it is important to develop a lightweight model which can work under constrained resources. In this work, we propose a shallow vision transformer for deepfake detection. Our proposed model uses an attention mechanism with a multi-head attention module. The attention mechanism highlights the important sections of deepfake images, whereas the multi-head attention module determines the attention that has to be given to each of the local-level features of an image. Finally, the softmax layer is used to classify an image as real or fake. The proposed model is shallow as it has 16.48 times fewer parameters and approx 2.97 times fewer FLOPS than the baseline vision transformer. Experiments on the Real Fake Face (RFF) and Real and Fake Face Detection (RFFD) datasets show that the model can achieve an accuracy of 92.15% and 88.52% respectively, which are better than many of the existing state-of-the-art models for deepfake detection like GoogleNet, XceptionNet, ResNet50, MesoNet, CNN and baseline vision transformers. Importantly, shallow ViT achieves an accuracy of 90.94% when only half of the RFF dataset is used for training the model, thereby demonstrating its applicability in constrained scenarios.

引用

页码：12339 / 12362

页数：24

共 50 条

[1] Efficient deepfake detection using shallow vision transformer
Shaheen Usmani
Sunil Kumar
Debanjan Sadhya
[J]. Multimedia Tools and Applications, 2024, 83 : 12339 - 12362
[2] Deepfake Image Detection using Vision Transformer Models
Ghita, Bogdan
Kuzminykh, Ievgeniia
Usama, Abubakar
Bakhshi, Taimur
Marchang, Jims
[J]. 2024 IEEE INTERNATIONAL BLACK SEA CONFERENCE ON COMMUNICATIONS AND NETWORKING, BLACKSEACOM 2024, 2024, : 332 - 335
[3] Improved Deepfake Video Detection Using Convolutional Vision Transformer
Deressa, Deressa Wodajo
Lambert, Peter
Van Wallendael, Glenn
Atnafu, Solomon
Mareen, Hannes
[J]. 2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
[4] DeepFake detection algorithm based on improved vision transformer
Heo, Young-Jin
Yeo, Woon-Ha
Kim, Byung-Gyu
[J]. APPLIED INTELLIGENCE, 2023, 53 (07) : 7512 - 7527
[5] DeepFake detection algorithm based on improved vision transformer
Young-Jin Heo
Woon-Ha Yeo
Byung-Gyu Kim
[J]. Applied Intelligence, 2023, 53 : 7512 - 7527
[6] DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer
Khormali, Aminollah
Yuan, Jiann-Shiun
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (06):
[7] Deepfake Detection Using Spatiotemporal Transformer
Kaddar, Bachir
Fezza, Sid Ahmed
Akhtar, Zahid
Hamidouche, Wassim
Hadid, Abdenour
Serra-Sagristà, Joan
[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (11)
[8] Unmasking Deception: Empowering Deepfake Detection with Vision Transformer Network
Arshed, Muhammad Asad
Alwadain, Ayed
Ali, Rao Faizan
Mumtaz, Shahzad
Ibrahim, Muhammad
Muneer, Amgad
[J]. MATHEMATICS, 2023, 11 (17)
[9] DeepFake detection with multi-scale convolution and vision transformer
Lin, Hao
Huang, Wenmin
Luo, Weiqi
Lu, Wei
[J]. DIGITAL SIGNAL PROCESSING, 2023, 134
[10] HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer
Kaddar, Bachir
Fezza, Sid Ahmed
Hamidouche, Wassim
Akhtar, Zahid
Hadid, Abdenour
[J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,

← 1 2 3 4 5 →