DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

被引：18

作者：

Khormali, Aminollah ^{[1
]}

Yuan, Jiann-Shiun ^{[1
]}

机构：

[1] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 06期

关键词：

cybersecurity; deep learning; deepfake detection; vision transformer;

D O I：

10.3390/app12062953

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT's transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT's excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.

引用

页数：17

共 50 条

[1] SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection
Kingra, Staffy
Aggarwal, Naveen
Kaur, Nirmal
[J]. FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 51
[2] End-to-End Multitask Learning With Vision Transformer
Tian, Yingjie
Bai, Kunlong
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9579 - 9590
[3] End-to-End Computer Vision Framework
Orhei, Ciprian
Mocofan, Muguras
Vert, Silviu
Vasiu, Radu
[J]. 2020 14TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2020, : 63 - 66
[4] Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework
Guo, Wenxuan
Du, Shuo
Deng, Huiyuan
Yu, Zikang
Feng, Lin
[J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[5] End-to-End Temporal Action Detection With Transformer
Liu, Xiaolong
Wang, Qimeng
Hu, Yao
Tang, Xu
Zhang, Shiwei
Bai, Song
Bai, Xiang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
[6] End-to-end lane detection with convolution and transformer
Zekun Ge
Chao Ma
Zhumu Fu
Shuzhong Song
Pengju Si
[J]. Multimedia Tools and Applications, 2023, 82 : 29607 - 29627
[7] End-to-end lane detection with convolution and transformer
Ge, Zekun
Ma, Chao
Fu, Zhumu
Song, Shuzhong
Si, Pengju
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29607 - 29627
[8] DefectTR: End-to-end defect detection for sewage networks using a transformer
Dang, L. Minh
Wang, Hanxiang
Li, Yanfen
Nguyen, Tan N.
Moon, Hyeonjoon
[J]. CONSTRUCTION AND BUILDING MATERIALS, 2022, 325
[9] SRDD: a lightweight end-to-end object detection with transformer
Zhu, Yuan
Xia, Qingyuan
Jin, Wen
[J]. CONNECTION SCIENCE, 2022, 34 (01) : 2448 - 2465
[10] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
Wu, Minglin
Li, Kun
Leung, Wai-Kim
Meng, Helen
[J]. INTERSPEECH 2021, 2021, : 3954 - 3958

← 1 2 3 4 5 →