High Performance DeepFake Video Detection on CNN-Based with Attention Target-Specific Regions and Manual Distillation Extraction

被引:14
|
作者
Tran, Van-Nhan [1 ]
Lee, Suk-Hwan [2 ]
Le, Hoanh-Su [3 ]
Kwon, Ki-Ryong [1 ]
机构
[1] Pukyong Natl Univ, Dept Artificial Intelligence Convergence, Busan 48513, South Korea
[2] Dong A Univ, Dept Comp Engn, Busan 49315, South Korea
[3] Vietnam Natl Univ Ho Chi Minh City, Univ Econ & Law, Fac Informat Syst, Ho Chi Minh City 700000, Vietnam
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 16期
基金
新加坡国家研究基金会;
关键词
DeepFake detection; computer vision and pattern recognition; artificial intelligence;
D O I
10.3390/app11167678
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital media. Nowadays, most models are based on deep learning neural networks and vision transformer, SOTA model with EfficientNetB7 backbone. However, due to the usage of excessively large backbones, these models have the intrinsic drawback of being too heavy. In our research, a high performance DeepFake detection model for manipulated video is proposed, ensuring accuracy of the model while keeping an appropriate weight. We inherited content from previous research projects related to distillation methodology but our proposal approached in a different way with manual distillation extraction, target-specific regions extraction, data augmentation, frame and multi-region ensemble, along with suggesting a CNN-based model as well as flexible classification with a dynamic threshold. Our proposal can reduce the overfitting problem, a common and particularly important problem affecting the quality of many models. So as to analyze the quality of our model, we performed tests on two datasets. DeepFake Detection Dataset (DFDC) with our model obtains 0.958 of AUC and 0.9243 of F1-score, compared with the SOTA model which obtains 0.972 of AUC and 0.906 of F1-score, and the smaller dataset Celeb-DF v2 with 0.978 of AUC and 0.9628 of F1-score.
引用
收藏
页数:14
相关论文
共 7 条
  • [1] Training Strategies and Data Augmentations in CNN-based DeepFake Video Detection
    Bondi, Luca
    Cannas, Edoardo Daniele
    Bestagini, Paolo
    Tubaro, Stefano
    [J]. 2020 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2020,
  • [2] CNN-Based Salient Target Detection Method of UAV Video Reconnaissance Image
    Na, Li
    [J]. International Journal of Advanced Computer Science and Applications, 2024, 15 (09) : 77 - 87
  • [3] Connecting Targets to Tweets: Semantic Attention-Based Model for Target-Specific Stance Detection
    Zhou, Yiwei
    Cristea, Alexandra, I
    Shi, Lei
    [J]. WEB INFORMATION SYSTEMS ENGINEERING, WISE 2017, PT I, 2017, 10569 : 18 - 32
  • [4] SAR Image Reconstruction Method for Target Detection Using Self-Attention CNN-Based Deep Prior Learning
    Li, Min
    Huo, Weibo
    Wu, Junjie
    Yang, Jianyu
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [5] High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection
    Zhang, Gang
    Zhang, Chaofan
    Wang, Fan
    Tang, Fulin
    Wu, Yihong
    Yang, Xuezhi
    Liu, Yong
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 117 - 128
  • [6] CNN-Based Two-Stage Parking Slot Detection Using Region-Specific Multi-Scale Feature Extraction
    Bui, Quang Huy
    Suhr, Jae Kyu
    [J]. IEEE ACCESS, 2023, 11 : 58491 - 58505
  • [7] ACCURATE DETECTION OF HIGH-SPEED MULTI-TARGET VIDEO SEQUENCES MOTION REGIONS BASED ON RECONSTRUCTED BACKGROUND DIFFERENCE
    Zhang Wentao Li Xiaofeng Li Zaiming (Inst. of Communication and Information
    [J]. Journal of Electronics(China), 2001, (01) : 1 - 7