Combining EfficientNet and Vision Transformers for Video Deepfake Detection

被引:54
|
作者
Coccomini, Davide Alessandro [1 ]
Messina, Nicola [1 ]
Gennaro, Claudio [1 ]
Falchi, Fabrizio [1 ]
机构
[1] Italian Natl Res Council CNR, Inst Informat Sci & Technol ISTI, Via G Moruzzi 1, I-56124 Pisa, Italy
关键词
Deep fake detection; Transformer networks; Deep learning;
D O I
10.1007/978-3-031-06433-3_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some very recent methods that use Vision Transformers. Differently from the state-of-the-art approaches, we use neither distillation nor ensemble methods. Furthermore, we present a straightforward inference procedure based on a simple voting scheme for handling multiple faces in the same video shot. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC).
引用
收藏
页码:219 / 229
页数:11
相关论文
共 50 条
  • [1] Cascaded Network Based on EfficientNet and Transformer for Deepfake Video Detection
    Deng, Liwei
    Wang, Jiandong
    Liu, Zhen
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (06) : 7057 - 7076
  • [2] Cascaded Network Based on EfficientNet and Transformer for Deepfake Video Detection
    Liwei Deng
    Jiandong Wang
    Zhen Liu
    [J]. Neural Processing Letters, 2023, 55 : 7057 - 7076
  • [3] Deepfake Video Detection Based on EfficientNet-V2 Network
    Deng, Liwei
    Suo, Hongfei
    Li, Dongjie
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [4] Improved Deepfake Video Detection Using Convolutional Vision Transformer
    Deressa, Deressa Wodajo
    Lambert, Peter
    Van Wallendael, Glenn
    Atnafu, Solomon
    Mareen, Hannes
    [J]. 2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
  • [5] Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection
    Coccomini, Davide Alessandro
    Caldelli, Roberto
    Falchi, Fabrizio
    Gennaro, Claudio
    Amato, Giuseppe
    [J]. 1ST ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2022, 2022, : 52 - 58
  • [6] Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection
    Essa, Ehab
    [J]. NEUROCOMPUTING, 2024, 598
  • [7] A Survey on Deepfake Video Detection
    Yu, Peipeng
    Xia, Zhihua
    Fei, Jianwei
    Lu, Yujiang
    [J]. IET BIOMETRICS, 2021, 10 (06) : 607 - 624
  • [8] HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer
    Kaddar, Bachir
    Fezza, Sid Ahmed
    Hamidouche, Wassim
    Akhtar, Zahid
    Hadid, Abdenour
    [J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [9] Adversarially Robust Deepfake Video Detection
    Devasthale, Aditya
    Sural, Shamik
    [J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 396 - 403
  • [10] A Robust Lightweight Deepfake Detection Network Using Transformers
    Zhang, Yaning
    Wang, Tianyi
    Shu, Minglei
    Wang, Yinglong
    [J]. PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2022, 13629 : 275 - 288