Improving Video Vision Transformer for Deepfake Video Detection Using Facial Landmark, Depthwise Separable Convolution and Self Attention

被引:1
|
作者
Ramadhani, Kurniawan Nur [1 ,2 ]
Munir, Rinaldi [1 ]
Utama, Nugraha Priya [1 ]
机构
[1] Bandung Inst Technol, Bandung 40132, Indonesia
[2] Telkom Univ, Bandung 40257, Indonesia
关键词
Deepfake detection; facial landmark; depthwise separable convolution; convolution block attention module; video vision transformer;
D O I
10.1109/ACCESS.2024.3352890
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present our result of research in video deepfake detection. We built a deepfake detection system to detect whether a video is a deepfake or real. The deepfake detection algorithm still struggle in providing a sufficient accuracy values, especially in challenging deepfake dataset. Our deepfake detection system utilized spatiotemporal feature that extracted using Video Vision Transformer (ViViT). The main contribution of our research is providing a deepfake detection system that based on ViViT architecture and using landmark area images for the input of the system. Our system extracted the feature from a number of spatial features. The spatial feature was extracted using Depthwise Separable Convolution (DSC) block combined with Convolution Block Attention Module (CBAM) from tubelet. The tubelet was a representation of facial landmark area that was extracted from the input video. In our system, we used 25 facial landmark area for an input video. In our experiment we used Celeb-DF version 2 dataset because it is considered to be a challenging deepfake dataset. We conducted augmentation to the dataset, so we obtained 8335 videos for training set, 390 videos for validation set, and 1123 videos for testing set. We trained our deepfake detection system using Adam optimizer, with learning rate of 10-4 and 100 epoch. From the experiment, we obtained the accuracy score of 87.18% and F1 score of 92.52%. We also conducted the ablation study to display the effect of each part of our model to the overall system performance. From this research, we obtained that by using landmark area images, our ViViT based deepfake detection system had a good performance in detecting deepfake videos.
引用
收藏
页码:8932 / 8939
页数:8
相关论文
共 24 条
  • [1] Improved Deepfake Video Detection Using Convolutional Vision Transformer
    Deressa, Deressa Wodajo
    Lambert, Peter
    Van Wallendael, Glenn
    Atnafu, Solomon
    Mareen, Hannes
    [J]. 2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
  • [2] VITRANSPAD: VIDEO TRANSFORMER USING CONVOLUTION AND SELF-ATTENTION FOR FACE PRESENTATION ATTACK DETECTION
    Ming, Zuheng
    Yu, Zitong
    Al-Ghadi, Musab
    Visani, Muriel
    Luqman, Muhammad Muzzamil
    Burie, Jean-Christophe
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4248 - 4252
  • [3] HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer
    Kaddar, Bachir
    Fezza, Sid Ahmed
    Hamidouche, Wassim
    Akhtar, Zahid
    Hadid, Abdenour
    [J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [4] TransAnomaly: Video Anomaly Detection Using Video Vision Transformer
    Yuan, Hongchun
    Cai, Zhenyu
    Zhou, Hui
    Wang, Yue
    Chen, Xiangzhi
    [J]. IEEE ACCESS, 2021, 9 : 123977 - 123986
  • [5] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
    Karima Omar
    Rasha H. Sakr
    Mohammed F. Alrahmawy
    [J]. Neural Computing and Applications, 2024, 36 : 2749 - 2765
  • [6] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
    Omar, Karima
    Sakr, Rasha H.
    Alrahmawy, Mohammed F.
    [J]. Neural Computing and Applications, 2024, 36 (06) : 2749 - 2765
  • [7] CViT: A Convolution Vision Transformer for Video Abnormal Behavior Detection and Localization
    Roka S.
    Diwakar M.
    [J]. SN Computer Science, 4 (6)
  • [8] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
    Omar, Karima
    Sakr, Rasha H.
    Alrahmawy, Mohammed F.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (06): : 2749 - 2765
  • [9] Improving Deepfake Video Detection with Comprehensive Self-consistency Learning
    Bao, Heng
    Deng, Lirui
    Guan, Jiazhi
    Zhang, Liang
    Chen, Xunxun
    [J]. CYBER SECURITY, CNCERT 2022, 2022, 1699 : 151 - 161
  • [10] Video Anomaly Detection Using Encoder-Decoder Networks with Video Vision Transformer and Channel Attention Blocks
    Kobayashi, Shimpei
    Hizukuri, Akiyoshi
    Nakayama, Ryohei
    [J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,