SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection

被引:0
|
作者
Kingra, Staffy [1 ,2 ]
Aggarwal, Naveen [1 ]
Kaur, Nirmal [1 ]
机构
[1] Panjab Univ, UIET, Chandigarh, India
[2] SGT Univ, FEAT, Gurugram, Haryana, India
关键词
Deepfake detection; Digital forensics; Facial manipulation detection; Spatio-temporal; Transformer; CNN;
D O I
10.1016/j.fsidi.2024.301817
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Growing AI advancements are continuously pacing GAN enhancement that eventually facilitates the generation of deepfake media. Manipulated media poses serious risks pertaining court proceedings, journalism, politics, and many more where digital media have a substantial impact on society. State-of-the-art techniques for deepfake detection rely on convolutional networks for spatial analysis, and recurrent networks for temporal analysis. Since transformers are capable of recognizing wide-range dependencies with a global spatial view and along temporal sequence too, a novel approach called "SFormer" is proposed in this paper, utilizing a transformer architecture for both spatial and temporal analysis to detect deepfakes. Further, state-of-the-art techniques suffer from high computational complexity and overfitting which causes loss in generalizability. The proposed model utilized a Swin Transformer for spatial analysis that resulted in low complexity, thereby enhancing its generalization ability and robustness against the different manipulation types. Proposed end-to-end transformer based model, SFormer, is proven to be effective for numerous deepfake datasets, including FF++, DFD, Celeb-DF, DFDC and Deeper-Forensics, and achieved an accuracy of 100%, 97.81%, 99.1%, 93.67% and 100% respectively. Moreover, SFormer has demonstrated superior performance compared to existing spatio-temporal and transformer-based approaches for deepfake detection.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework
    Guo, Wenxuan
    Du, Shuo
    Deng, Huiyuan
    Yu, Zikang
    Feng, Lin
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [2] DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer
    Khormali, Aminollah
    Yuan, Jiann-Shiun
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (06):
  • [3] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
  • [4] End-to-End Learning of Video Compression Using Spatio-Temporal Autoencoders
    Pessoa, Jorge
    Aidos, Helena
    Tomas, Pedro
    Figueiredo, Mario A. T.
    [J]. 2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 276 - 281
  • [5] Improving Users Engagement Detection using End-to-End Spatio-Temporal Convolutional Neural Networks
    Saleh, Khaled
    Yu, Kun
    Chen, Fang
    [J]. HRI '21: COMPANION OF THE 2021 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2021, : 190 - 194
  • [6] Multimodal Spatio-Temporal Information in End-to-End Networks for Automotive Steering Prediction
    Abou-Hussein, Mohamed
    Mueller, Stefan H.
    Boedecker, Joschka
    [J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 8641 - 8647
  • [7] Spatio-Temporal Catcher: a Self-Supervised Transformer for Deepfake Video Detection
    Li, Maosen
    Li, Xurong
    Yu, Kun
    Deng, Cheng
    Huang, Heng
    Mao, Feng
    Xue, Hui
    Li, Minghao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8707 - 8718
  • [8] DiTNet: End-to-End 3D Object Detection and Track ID Assignment in Spatio-Temporal World
    Wang, Sukai
    Cai, Peide
    Wang, Lujia
    Liu, Ming
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 3397 - 3404
  • [9] End-to-End Background Subtraction via a Multi-Scale Spatio-Temporal Model
    Yang, Yizhong
    Zhang, Tao
    Hu, Jinzhao
    Xu, Dong
    Xie, Guangjun
    [J]. IEEE ACCESS, 2019, 7 : 97949 - 97958
  • [10] TEST: an End-to-End Network Traffic Classification System With Spatio-Temporal Features Extraction
    Zeng, Yi
    Qi, Zihao
    Chen, Wencheng
    Huang, Yanzhe
    [J]. 4TH IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2019) / 3RD INTERNATIONAL SYMPOSIUM ON REINFORCEMENT LEARNING (ISRL 2019), 2019, : 131 - 136