Improving Video Vision Transformer for Deepfake Video Detection Using Facial Landmark, Depthwise Separable Convolution and Self Attention

被引：1

作者：

Ramadhani, Kurniawan Nur ^{[1
,2
]}

Munir, Rinaldi ^{[1
]}

Utama, Nugraha Priya ^{[1
]}

机构：

[1] Bandung Inst Technol, Bandung 40132, Indonesia

[2] Telkom Univ, Bandung 40257, Indonesia

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Deepfake detection; facial landmark; depthwise separable convolution; convolution block attention module; video vision transformer;

D O I：

10.1109/ACCESS.2024.3352890

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we present our result of research in video deepfake detection. We built a deepfake detection system to detect whether a video is a deepfake or real. The deepfake detection algorithm still struggle in providing a sufficient accuracy values, especially in challenging deepfake dataset. Our deepfake detection system utilized spatiotemporal feature that extracted using Video Vision Transformer (ViViT). The main contribution of our research is providing a deepfake detection system that based on ViViT architecture and using landmark area images for the input of the system. Our system extracted the feature from a number of spatial features. The spatial feature was extracted using Depthwise Separable Convolution (DSC) block combined with Convolution Block Attention Module (CBAM) from tubelet. The tubelet was a representation of facial landmark area that was extracted from the input video. In our system, we used 25 facial landmark area for an input video. In our experiment we used Celeb-DF version 2 dataset because it is considered to be a challenging deepfake dataset. We conducted augmentation to the dataset, so we obtained 8335 videos for training set, 390 videos for validation set, and 1123 videos for testing set. We trained our deepfake detection system using Adam optimizer, with learning rate of 10-4 and 100 epoch. From the experiment, we obtained the accuracy score of 87.18% and F1 score of 92.52%. We also conducted the ablation study to display the effect of each part of our model to the overall system performance. From this research, we obtained that by using landmark area images, our ViViT based deepfake detection system had a good performance in detecting deepfake videos.

引用

页码：8932 / 8939

页数：8

共 24 条

[1] Improved Deepfake Video Detection Using Convolutional Vision Transformer
Deressa, Deressa Wodajo
Lambert, Peter
Van Wallendael, Glenn
Atnafu, Solomon
Mareen, Hannes
[J]. 2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
[2] VITRANSPAD: VIDEO TRANSFORMER USING CONVOLUTION AND SELF-ATTENTION FOR FACE PRESENTATION ATTACK DETECTION
Ming, Zuheng
Yu, Zitong
Al-Ghadi, Musab
Visani, Muriel
Luqman, Muhammad Muzzamil
Burie, Jean-Christophe
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4248 - 4252
[3] HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer
Kaddar, Bachir
Fezza, Sid Ahmed
Hamidouche, Wassim
Akhtar, Zahid
Hadid, Abdenour
[J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
[4] TransAnomaly: Video Anomaly Detection Using Video Vision Transformer
Yuan, Hongchun
Cai, Zhenyu
Zhou, Hui
Wang, Yue
Chen, Xiangzhi
[J]. IEEE ACCESS, 2021, 9 : 123977 - 123986
[5] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
Karima Omar
Rasha H. Sakr
Mohammed F. Alrahmawy
[J]. Neural Computing and Applications, 2024, 36 : 2749 - 2765
[6] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
Omar, Karima
Sakr, Rasha H.
Alrahmawy, Mohammed F.
[J]. Neural Computing and Applications, 2024, 36 (06) : 2749 - 2765
[7] CViT: A Convolution Vision Transformer for Video Abnormal Behavior Detection and Localization
Roka S.
Diwakar M.
[J]. SN Computer Science, 4 (6)
[8] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
Omar, Karima
Sakr, Rasha H.
Alrahmawy, Mohammed F.
[J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (06): : 2749 - 2765
[9] Improving Deepfake Video Detection with Comprehensive Self-consistency Learning
Bao, Heng
Deng, Lirui
Guan, Jiazhi
Zhang, Liang
Chen, Xunxun
[J]. CYBER SECURITY, CNCERT 2022, 2022, 1699 : 151 - 161
[10] Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks
Kobayashi, Shimpei
Hizukuri, Akiyoshi
Nakayama, Ryohei
[J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,

← 1 2 3 →