ViXNet: Vision Transformer with Xception Network for deepfakes based video and image forgery detection

被引:25
|
作者
Ganguly, Shreyan [1 ]
Ganguly, Aditya [2 ]
Mohiuddin, Sk [3 ]
Malakar, Samir [3 ]
Sarkar, Ram [2 ]
机构
[1] Jadavpur Univ, Dept Construct Engn, Kolkata, India
[2] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[3] Asutosh Coll, Dept Comp Sci, Kolkata, India
关键词
Deepfakes; FaceSwap; Soft attention; Vision transformer; Forgery detection; Xception model;
D O I
10.1016/j.eswa.2022.118423
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advent of image generative technologies, there is a huge growth in the development of facial manipulation techniques that allow people to easily modify media data like videos and images by changing the identity or facial expression of the target person with another person's face. Colloquially, these manipulated videos and images are termed "deepfakes". As a result, every piece of content in digital media comes with a question - is this authentic? Hence, there is an unprecedented need for a competent deepfakes detection method. The rapid changes in forging methods make this a very challenging task and thus generalization of the detection methods is also of utmost required. However, the generalization strengths of the prevailing deepfakes detection methods are not satisfactory. In other words, these models perform well when trained and tested on the same dataset but fail to perform satisfactorily when models are trained on one dataset and tested on another. The most modern deep learning aided deepfakes detection techniques looked for a consistent pattern among the leftover artifacts in specific facial regions of the target face rather than the entire face. To this end, we propose a Vision Transformer with Xception Network (ViXNet) to learn the consistency of these almost imperceptible artifacts left by deepfaking methods on the entire facial region. The ViXNet comprises two branches - one tries to learn inconsistencies among local face region specifics by combining patch-wise self-attention module and vision transformer, and the other generates global spatial features using a deep convolutional neural network. To assess the performance of ViXNet, we evaluate it using two different experimental setups - intra-dataset and inter-dataset when using three standard deepfakes video datasets, namely FaceForensics++, and Celeb-DF (V2) and one deepfakes image dataset called Deepfakes. We have attained 98.57% (83.60%), 99.26% (74.78%), and 98.93% (75.13%) AUC scores using intra(inter)-dataset experimental setups on FaceForensics++, Celeb-DF (V2), and Deepfakes datasets respectively. Additionally, we have evaluated ViXNet on the Deepfake Detection Challenge (DFDC) dataset and we have obtained 86.32% AUC score and 79.06% F1-score on the said dataset. Performances of the proposed model are comparable to state-of-the-art methods. Besides, the obtained results ensure the robustness and the generalization ability of the proposed model.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] E-Cap Net: an efficient-capsule network for shallow and deepfakes forgery detection
    Hafsa Ilyas
    Ali Javed
    Khalid Mahmood Malik
    Aun Irtaza
    [J]. Multimedia Systems, 2023, 29 : 2165 - 2180
  • [22] Cascaded Network Based on EfficientNet and Transformer for Deepfake Video Detection
    Liwei Deng
    Jiandong Wang
    Zhen Liu
    [J]. Neural Processing Letters, 2023, 55 : 7057 - 7076
  • [23] Cascaded Network Based on EfficientNet and Transformer for Deepfake Video Detection
    Deng, Liwei
    Wang, Jiandong
    Liu, Zhen
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (06) : 7057 - 7076
  • [24] COMPRESSION NOISE BASED VIDEO FORGERY DETECTION
    Ravi, Hareesh
    Subramanyam, A. V.
    Gupta, Gaurav
    Kumar, B. Avinash
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5352 - 5356
  • [25] Detection of image recognition forgery technology under machine vision
    Liu, Yong
    Zhang, Yinjie
    Wang, Zonghui
    Cheng, Ruosi
    Zhao, Xu
    Shi, Baolan
    [J]. INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2024, 45 (02) : 123 - 134
  • [26] PIXEL ESTIMATION BASED VIDEO FORGERY DETECTION'
    Subramanyam, A. V.
    Emmanuel, Sabu
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3038 - 3042
  • [27] ENF Based Video Forgery Detection Algorithm
    Wang, Yufei
    Hu, Yongjian
    Liew, Alan Wee-Chung
    Li, Chang-Tsun
    [J]. INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2020, 12 (01) : 131 - 156
  • [28] Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion
    Zhao, Jinmiao
    Shi, Zelin
    Yu, Chuang
    Liu, Yunpeng
    [J]. Remote Sensing, 2024, 16 (22)
  • [29] Image Forgery Detection Based on Semantic Image Understanding
    Ye, Kui
    Dong, Jing
    Wang, Wei
    Xu, Jindong
    Tan, Tieniu
    [J]. COMPUTER VISION, PT I, 2017, 771 : 472 - 481
  • [30] A sequential convolutional neural network for image forgery detection
    Kaur, Simranjot
    Chopra, Sumit
    Nayyar, Anchal
    Sharma, Rajesh
    Singh, Gagandeep
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 41311 - 41325