Combining EfficientNet and Vision Transformers for Video Deepfake Detection

被引:54
|
作者
Coccomini, Davide Alessandro [1 ]
Messina, Nicola [1 ]
Gennaro, Claudio [1 ]
Falchi, Fabrizio [1 ]
机构
[1] Italian Natl Res Council CNR, Inst Informat Sci & Technol ISTI, Via G Moruzzi 1, I-56124 Pisa, Italy
关键词
Deep fake detection; Transformer networks; Deep learning;
D O I
10.1007/978-3-031-06433-3_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some very recent methods that use Vision Transformers. Differently from the state-of-the-art approaches, we use neither distillation nor ensemble methods. Furthermore, we present a straightforward inference procedure based on a simple voting scheme for handling multiple faces in the same video shot. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC).
引用
收藏
页码:219 / 229
页数:11
相关论文
共 50 条
  • [31] Deepfake Detection with Deep Learning: Convolutional Neural Networks versus Transformers
    Thing, Vrizlynn L. L.
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2023, : 246 - 253
  • [32] A hierarchical feature selection strategy for deepfake video detection
    Mohiuddin, Sk
    Sheikh, Khalid Hassan
    Malakar, Samir
    Velasquez, Juan D.
    Sarkar, Ram
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (13): : 9363 - 9380
  • [33] Sharp Multiple Instance Learning for DeepFake Video Detection
    Li, Xiaodan
    Lang, Yining
    Chen, Yuefeng
    Mao, Xiaofeng
    He, Yuan
    Wang, Shuhui
    Xue, Hui
    Lu, Quan
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1864 - 1872
  • [34] Exploiting Complementary Dynamic Incoherence for DeepFake Video Detection
    Wang, Hanyi
    Liu, Zihan
    Wang, Shilin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4027 - 4040
  • [35] Hyperbolic Vision Transformers: Combining Improvements in Metric Learning
    Ermolov, Aleksandr
    Mirvakhabova, Leyla
    Khrulkov, Valentin
    Sebe, Nicu
    Oseledets, Ivan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7399 - 7409
  • [36] A hierarchical feature selection strategy for deepfake video detection
    Sk Mohiuddin
    Khalid Hassan Sheikh
    Samir Malakar
    Juan D. Velásquez
    Ram Sarkar
    [J]. Neural Computing and Applications, 2023, 35 : 9363 - 9380
  • [37] Classification of Adventitious Sounds Combining Cochleogram and Vision Transformers
    Mang, Loredana Daria
    Martinez, Francisco David Gonzalez
    Munoz, Damian Martinez
    Galan, Sebastian Garcia
    Cortina, Raquel
    [J]. SENSORS, 2024, 24 (02)
  • [38] Deepfake Video Detection via Predictive Representation Learning
    Ge, Shiming
    Lin, Fanzhao
    Li, Chenyu
    Zhang, Daichi
    Wang, Weiping
    Zeng, Dan
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [39] Deepfake Video Detection Method Improved by GRU and Involution
    Liu, Yalin
    Lu, Tianliang
    [J]. Computer Engineering and Applications, 2023, 59 (22) : 276 - 283
  • [40] Deepfake Video Detection Using Recurrent Neural Networks
    Guera, David
    Delp, Edward J.
    [J]. 2018 15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2018, : 127 - 132