DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

被引：18

作者：

Khormali, Aminollah ^{[1
]}

Yuan, Jiann-Shiun ^{[1
]}

机构：

[1] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 06期

关键词：

cybersecurity; deep learning; deepfake detection; vision transformer;

D O I：

10.3390/app12062953

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT's transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT's excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.

引用

页数：17

共 50 条

[41] End-to-end optimization of prosthetic vision
van Steveninck, Jaap de Ruyter
Guclu, Umut
van wezel, Richard
van Gerven, Marcel
[J]. JOURNAL OF VISION, 2022, 22 (02):
[42] Computer vision for transit travel time prediction: an end-to-end framework using roadside urban imagery
Abdelhalim, Awad
Zhao, Jinhua
[J]. PUBLIC TRANSPORT, 2024,
[43] End-to-end varifocal multiview images coding framework from data acquisition end to vision application end
Wu, Kejun
Liu, Qiong
Wang, Yi
Yang, You
[J]. OPTICS EXPRESS, 2023, 31 (07) : 11659 - 11679
[44] DocFormer: End-to-End Transformer for Document Understanding
Appalaraju, Srikar
Jasani, Bhavan
Kota, Bhargava Urala
Xie, Yusheng
Manmatha, R.
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 973 - 983
[45] An End-to-End Transformer Model for Crowd Localization
Liang, Dingkang
Xu, Wei
Bai, Xiang
[J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 38 - 54
[46] AN EFFICIENT END-TO-END IMAGE COMPRESSION TRANSFORMER
Jeny, Afsana Ahsan
Junayed, Masum Shah
Islam, Md Baharul
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1786 - 1790
[47] End-to-end point cloud registration with transformer
Wang, Yong
Zhou, Pengbo
Geng, Guohua
An, Li
Zhang, Qi
[J]. Artificial Intelligence Review, 2025, 58 (01)
[48] MulT: An End-to-End Multitask Learning Transformer
Bhattacharjee, Deblina
Zhang, Tong
Suesstrunk, Sabine
Salzmann, Mathieu
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12021 - 12031
[49] End-to-End Video Text Spotting with Transformer
Wu, Weijia
Cai, Yuanqiang
Shen, Chunhua
Zhang, Debing
Fu, Ying
Zhou, Hong
Luo, Ping
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 4019 - 4035
[50] Sequential Transformer for End-to-End Person Search
Chen, Long
Xu, Jinhua
[J]. NEURAL INFORMATION PROCESSING, ICONIP 2023, PT IV, 2024, 14450 : 226 - 238

← 1 2 3 4 5 →