DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

被引:18
|
作者
Khormali, Aminollah [1 ]
Yuan, Jiann-Shiun [1 ]
机构
[1] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 06期
关键词
cybersecurity; deep learning; deepfake detection; vision transformer;
D O I
10.3390/app12062953
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT's transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT's excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] End-to-end optimization of prosthetic vision
    van Steveninck, Jaap de Ruyter
    Guclu, Umut
    van wezel, Richard
    van Gerven, Marcel
    [J]. JOURNAL OF VISION, 2022, 22 (02):
  • [42] Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
    Abdelhalim, Awad
    Zhao, Jinhua
    [J]. PUBLIC TRANSPORT, 2024,
  • [43] End-to-end varifocal multiview images coding framework from data acquisition end to vision application end
    Wu, Kejun
    Liu, Qiong
    Wang, Yi
    Yang, You
    [J]. OPTICS EXPRESS, 2023, 31 (07) : 11659 - 11679
  • [44] DocFormer: End-to-End Transformer for Document Understanding
    Appalaraju, Srikar
    Jasani, Bhavan
    Kota, Bhargava Urala
    Xie, Yusheng
    Manmatha, R.
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 973 - 983
  • [45] An End-to-End Transformer Model for Crowd Localization
    Liang, Dingkang
    Xu, Wei
    Bai, Xiang
    [J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 38 - 54
  • [46] AN EFFICIENT END-TO-END IMAGE COMPRESSION TRANSFORMER
    Jeny, Afsana Ahsan
    Junayed, Masum Shah
    Islam, Md Baharul
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1786 - 1790
  • [47] End-to-end point cloud registration with transformer
    Wang, Yong
    Zhou, Pengbo
    Geng, Guohua
    An, Li
    Zhang, Qi
    [J]. Artificial Intelligence Review, 2025, 58 (01)
  • [48] MulT: An End-to-End Multitask Learning Transformer
    Bhattacharjee, Deblina
    Zhang, Tong
    Suesstrunk, Sabine
    Salzmann, Mathieu
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12021 - 12031
  • [49] End-to-End Video Text Spotting with Transformer
    Wu, Weijia
    Cai, Yuanqiang
    Shen, Chunhua
    Zhang, Debing
    Fu, Ying
    Zhou, Hong
    Luo, Ping
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 4019 - 4035
  • [50] Sequential Transformer for End-to-End Person Search
    Chen, Long
    Xu, Jinhua
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 226 - 238