ViXNet: Vision Transformer with Xception Network for deepfakes based video and image forgery detection

被引：25

作者：

Ganguly, Shreyan ^{[1
]}

Ganguly, Aditya ^{[2
]}

Mohiuddin, Sk ^{[3
]}

Malakar, Samir ^{[3
]}

Sarkar, Ram ^{[2
]}

机构：

[1] Jadavpur Univ, Dept Construct Engn, Kolkata, India

[2] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India

[3] Asutosh Coll, Dept Comp Sci, Kolkata, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 210卷

关键词：

Deepfakes; FaceSwap; Soft attention; Vision transformer; Forgery detection; Xception model;

D O I：

10.1016/j.eswa.2022.118423

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the advent of image generative technologies, there is a huge growth in the development of facial manipulation techniques that allow people to easily modify media data like videos and images by changing the identity or facial expression of the target person with another person's face. Colloquially, these manipulated videos and images are termed "deepfakes". As a result, every piece of content in digital media comes with a question - is this authentic? Hence, there is an unprecedented need for a competent deepfakes detection method. The rapid changes in forging methods make this a very challenging task and thus generalization of the detection methods is also of utmost required. However, the generalization strengths of the prevailing deepfakes detection methods are not satisfactory. In other words, these models perform well when trained and tested on the same dataset but fail to perform satisfactorily when models are trained on one dataset and tested on another. The most modern deep learning aided deepfakes detection techniques looked for a consistent pattern among the leftover artifacts in specific facial regions of the target face rather than the entire face. To this end, we propose a Vision Transformer with Xception Network (ViXNet) to learn the consistency of these almost imperceptible artifacts left by deepfaking methods on the entire facial region. The ViXNet comprises two branches - one tries to learn inconsistencies among local face region specifics by combining patch-wise self-attention module and vision transformer, and the other generates global spatial features using a deep convolutional neural network. To assess the performance of ViXNet, we evaluate it using two different experimental setups - intra-dataset and inter-dataset when using three standard deepfakes video datasets, namely FaceForensics++, and Celeb-DF (V2) and one deepfakes image dataset called Deepfakes. We have attained 98.57% (83.60%), 99.26% (74.78%), and 98.93% (75.13%) AUC scores using intra(inter)-dataset experimental setups on FaceForensics++, Celeb-DF (V2), and Deepfakes datasets respectively. Additionally, we have evaluated ViXNet on the Deepfake Detection Challenge (DFDC) dataset and we have obtained 86.32% AUC score and 79.06% F1-score on the said dataset. Performances of the proposed model are comparable to state-of-the-art methods. Besides, the obtained results ensure the robustness and the generalization ability of the proposed model.

引用

页数：15

共 50 条

[21] E-Cap Net: an efficient-capsule network for shallow and deepfakes forgery detection
Hafsa Ilyas
Ali Javed
Khalid Mahmood Malik
Aun Irtaza
[J]. Multimedia Systems, 2023, 29 : 2165 - 2180
[22] Cascaded Network Based on EfficientNet and Transformer for Deepfake Video Detection
Liwei Deng
Jiandong Wang
Zhen Liu
[J]. Neural Processing Letters, 2023, 55 : 7057 - 7076
[23] Cascaded Network Based on EfficientNet and Transformer for Deepfake Video Detection
Deng, Liwei
Wang, Jiandong
Liu, Zhen
[J]. NEURAL PROCESSING LETTERS, 2023, 55 (06) : 7057 - 7076
[24] COMPRESSION NOISE BASED VIDEO FORGERY DETECTION
Ravi, Hareesh
Subramanyam, A. V.
Gupta, Gaurav
Kumar, B. Avinash
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5352 - 5356
[25] Detection of image recognition forgery technology under machine vision
Liu, Yong
Zhang, Yinjie
Wang, Zonghui
Cheng, Ruosi
Zhao, Xu
Shi, Baolan
[J]. INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2024, 45 (02) : 123 - 134
[26] PIXEL ESTIMATION BASED VIDEO FORGERY DETECTION'
Subramanyam, A. V.
Emmanuel, Sabu
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3038 - 3042
[27] ENF Based Video Forgery Detection Algorithm
Wang, Yufei
Hu, Yongjian
Liew, Alan Wee-Chung
Li, Chang-Tsun
[J]. INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2020, 12 (01) : 131 - 156
[28] Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion
Zhao, Jinmiao
Shi, Zelin
Yu, Chuang
Liu, Yunpeng
[J]. Remote Sensing, 2024, 16 (22)
[29] Image Forgery Detection Based on Semantic Image Understanding
Ye, Kui
Dong, Jing
Wang, Wei
Xu, Jindong
Tan, Tieniu
[J]. COMPUTER VISION, PT I, 2017, 771 : 472 - 481
[30] A sequential convolutional neural network for image forgery detection
Kaur, Simranjot
Chopra, Sumit
Nayyar, Anchal
Sharma, Rajesh
Singh, Gagandeep
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 41311 - 41325

← 1 2 3 4 5 →