HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer

被引:7
|
作者
Kaddar, Bachir [1 ]
Fezza, Sid Ahmed [2 ]
Hamidouche, Wassim [3 ]
Akhtar, Zahid [4 ]
Hadid, Abdenour [5 ]
机构
[1] Univ Ibn Khaldoun, Dept Nat Sci & Life, Tiaret, Algeria
[2] Natl Inst Telecommun & ICT, Oran, Algeria
[3] Univ Rennes, INSA Rennes, CNRS, IETR UMR 6164, Rennes, France
[4] State Univ New York Polytech Inst, Utica, NY USA
[5] Univ Polytech Hauts de France, Univ Lille, CNRS, Cent Lille,UMR 8520,IEMN, Valenciennes, France
关键词
DeepFake video; detection; convolutional neural network; vision transformer; hybrid;
D O I
10.1109/VCIP53242.2021.9675402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of new falsified video contents is dramatically increasing, making the need to develop effective deepfake detection methods more urgent than ever. Even though many existing deepfake detection approaches show promising results, the majority of them still suffer from a number of critical limitations. In general, poor generalization results have been obtained under unseen or new deepfake generation methods. Consequently, in this paper, we propose a deepfake detection method called HOT, which combines Convolutional Neural Network (CNN) with Vision Transformer (ViT). The HCiT hybrid architecture exploits the advantages of CNN to extract local information with the ViT's self-attention mechanism to improve the detection accuracy. In this hybrid architecture, the feature maps extracted from the CNN are feed into ViT model that determines whether a specific video is fake or real. Experiments were performed on Faceforensics++ and DeepFake Detection Challenge preview datasets, and the results show that the proposed method significantly outperforms the state-of-the-art methods. In addition, the HCiT method shows a great capacity for generalization on datasets covering various techniques of deepfake generation. The source code is available at: https://github.com/KADDAR-Bachir/HCiT
引用
收藏
页数:5
相关论文
共 50 条
  • [41] A CNN-Transformer Hybrid Model Based on CSWin Transformer for UAV Image Object Detection
    Lu, Wanjie
    Lan, Chaozhen
    Niu, Chaoyang
    Liu, Wei
    Lyu, Liang
    Shi, Qunshan
    Wang, Shiju
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 1211 - 1231
  • [42] Multi-model DeepFake Detection Using Deep and Temporal Features
    John, Jerry
    Sherif, Bismin V.
    [J]. THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND CAPSULE NETWORKS (ICIPCN 2022), 2022, 514 : 672 - 684
  • [43] A Novel Hybrid Vision Transformer CNN for COVID-19 Detection from ECG Images
    Naidji, Mohamed Rami
    Elberrichi, Zakaria
    [J]. COMPUTERS, 2024, 13 (05)
  • [44] A Unified Model for Face Matching and Presentation Attack Detection using an Ensemble of Vision Transformer Features
    Al-Refai, Rouqaiah
    Nandakumar, Karthik
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 662 - 671
  • [45] Video Copy Detection Using Spatio-Temporal CNN Features
    Zhou, Zhili
    Chen, Jingcheng
    Yang, Ching-Nung
    Sun, Xingming
    [J]. IEEE ACCESS, 2019, 7 : 100658 - 100665
  • [47] TEXT DETECTION IN VIDEO FRAMES USING HYBRID FEATURES
    Ji, Zhong
    Wang, Jian
    Su, Yu-Ting
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 318 - 322
  • [48] A Vision-Based Pothole Detection Using CNN Model
    Kumar P.
    Pooja
    Chauhan N.
    Chaurasia N.
    [J]. SN Computer Science, 4 (6)
  • [49] Deepfake Video Detection Using Recurrent Neural Networks
    Guera, David
    Delp, Edward J.
    [J]. 2018 15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2018, : 127 - 132
  • [50] Deepfake video detection using deep learning algorithms
    Korkmaz, Sahin
    Alkan, Mustafa
    [J]. JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (02): : 855 - 862