HCiT: Deepfake Video Detection Using a Hybrid Model of CNN features and Vision Transformer

被引:7
|
作者
Kaddar, Bachir [1 ]
Fezza, Sid Ahmed [2 ]
Hamidouche, Wassim [3 ]
Akhtar, Zahid [4 ]
Hadid, Abdenour [5 ]
机构
[1] Univ Ibn Khaldoun, Dept Nat Sci & Life, Tiaret, Algeria
[2] Natl Inst Telecommun & ICT, Oran, Algeria
[3] Univ Rennes, INSA Rennes, CNRS, IETR UMR 6164, Rennes, France
[4] State Univ New York Polytech Inst, Utica, NY USA
[5] Univ Polytech Hauts de France, Univ Lille, CNRS, Cent Lille,UMR 8520,IEMN, Valenciennes, France
关键词
DeepFake video; detection; convolutional neural network; vision transformer; hybrid;
D O I
10.1109/VCIP53242.2021.9675402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of new falsified video contents is dramatically increasing, making the need to develop effective deepfake detection methods more urgent than ever. Even though many existing deepfake detection approaches show promising results, the majority of them still suffer from a number of critical limitations. In general, poor generalization results have been obtained under unseen or new deepfake generation methods. Consequently, in this paper, we propose a deepfake detection method called HOT, which combines Convolutional Neural Network (CNN) with Vision Transformer (ViT). The HCiT hybrid architecture exploits the advantages of CNN to extract local information with the ViT's self-attention mechanism to improve the detection accuracy. In this hybrid architecture, the feature maps extracted from the CNN are feed into ViT model that determines whether a specific video is fake or real. Experiments were performed on Faceforensics++ and DeepFake Detection Challenge preview datasets, and the results show that the proposed method significantly outperforms the state-of-the-art methods. In addition, the HCiT method shows a great capacity for generalization on datasets covering various techniques of deepfake generation. The source code is available at: https://github.com/KADDAR-Bachir/HCiT
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Improved Deepfake Video Detection Using Convolutional Vision Transformer
    Deressa, Deressa Wodajo
    Lambert, Peter
    Van Wallendael, Glenn
    Atnafu, Solomon
    Mareen, Hannes
    [J]. 2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
  • [2] A Hybrid CNN-LSTM model for Video Deepfake Detection by Leveraging Optical Flow Features
    Saikia, Pallabi
    Dholaria, Dhwani
    Yadav, Priyanka
    Patel, Vaidehi
    Roy, Mohendra
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [3] Efficient deepfake detection using shallow vision transformer
    Usmani, Shaheen
    Kumar, Sunil
    Sadhya, Debanjan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12339 - 12362
  • [4] Efficient deepfake detection using shallow vision transformer
    Shaheen Usmani
    Sunil Kumar
    Debanjan Sadhya
    [J]. Multimedia Tools and Applications, 2024, 83 : 12339 - 12362
  • [5] Deepfake Image Detection using Vision Transformer Models
    Ghita, Bogdan
    Kuzminykh, Ievgeniia
    Usama, Abubakar
    Bakhshi, Taimur
    Marchang, Jims
    [J]. 2024 IEEE INTERNATIONAL BLACK SEA CONFERENCE ON COMMUNICATIONS AND NETWORKING, BLACKSEACOM 2024, 2024, : 332 - 335
  • [6] A Performance Enhancement of Deepfake Video Detection through the use of a Hybrid CNN Deep Learning Model
    Ikram, Sumaiya Thaseen
    Priya, V
    Chambial, Shourya
    Sood, Dhruv
    Arulkumar, V
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (02) : 169 - 178
  • [7] Hybrid Transformer Network for Deepfake Detection
    Khan, Sohail Ahmed
    Dang-Nguyen, Duc-Tien
    [J]. 19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 8 - 14
  • [8] Improving Video Vision Transformer for Deepfake Video Detection Using Facial Landmark, Depthwise Separable Convolution and Self Attention
    Ramadhani, Kurniawan Nur
    Munir, Rinaldi
    Utama, Nugraha Priya
    [J]. IEEE ACCESS, 2024, 12 : 8932 - 8939
  • [9] Deepfake Video Detection with Spatiotemporal Dropout Transformer
    Zhang, Daichi
    Lin, Fanzhao
    Hua, Yingying
    Wang, Pengju
    Zeng, Dan
    Ge, Shiming
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5833 - 5841
  • [10] Video Transformer for Deepfake Detection with Incremental Learning
    Khan, Sohail Ahmed
    Dai, Hang
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1821 - 1828