DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

被引:18
|
作者
Khormali, Aminollah [1 ]
Yuan, Jiann-Shiun [1 ]
机构
[1] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 06期
关键词
cybersecurity; deep learning; deepfake detection; vision transformer;
D O I
10.3390/app12062953
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT's transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT's excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection
    Kingra, Staffy
    Aggarwal, Naveen
    Kaur, Nirmal
    [J]. FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 51
  • [2] End-to-End Multitask Learning With Vision Transformer
    Tian, Yingjie
    Bai, Kunlong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9579 - 9590
  • [3] End-to-End Computer Vision Framework
    Orhei, Ciprian
    Mocofan, Muguras
    Vert, Silviu
    Vasiu, Radu
    [J]. 2020 14TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2020, : 63 - 66
  • [4] Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework
    Guo, Wenxuan
    Du, Shuo
    Deng, Huiyuan
    Yu, Zikang
    Feng, Lin
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
  • [6] End-to-end lane detection with convolution and transformer
    Zekun Ge
    Chao Ma
    Zhumu Fu
    Shuzhong Song
    Pengju Si
    [J]. Multimedia Tools and Applications, 2023, 82 : 29607 - 29627
  • [7] End-to-end lane detection with convolution and transformer
    Ge, Zekun
    Ma, Chao
    Fu, Zhumu
    Song, Shuzhong
    Si, Pengju
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29607 - 29627
  • [8] DefectTR: End-to-end defect detection for sewage networks using a transformer
    Dang, L. Minh
    Wang, Hanxiang
    Li, Yanfen
    Nguyen, Tan N.
    Moon, Hyeonjoon
    [J]. CONSTRUCTION AND BUILDING MATERIALS, 2022, 325
  • [9] SRDD: a lightweight end-to-end object detection with transformer
    Zhu, Yuan
    Xia, Qingyuan
    Jin, Wen
    [J]. CONNECTION SCIENCE, 2022, 34 (01) : 2448 - 2465
  • [10] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
    Wu, Minglin
    Li, Kun
    Leung, Wai-Kim
    Meng, Helen
    [J]. INTERSPEECH 2021, 2021, : 3954 - 3958