A Robust Approach to Multimodal Deepfake Detection

被引:9
|
作者
Salvi, Davide [1 ]
Liu, Honggu [2 ]
Mandelli, Sara [1 ]
Bestagini, Paolo [1 ]
Zhou, Wenbo [2 ]
Zhang, Weiming [2 ]
Tubaro, Stefano [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, I-20133 Milan, Italy
[2] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230026, Peoples R China
关键词
deepfake detection; video forensics; audio forensics; multimodality;
D O I
10.3390/jimaging9060122
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
The widespread use of deep learning techniques for creating realistic synthetic media, commonly known as deepfakes, poses a significant threat to individuals, organizations, and society. As the malicious use of these data could lead to unpleasant situations, it is becoming crucial to distinguish between authentic and fake media. Nonetheless, though deepfake generation systems can create convincing images and audio, they may struggle to maintain consistency across different data modalities, such as producing a realistic video sequence where both visual frames and speech are fake and consistent one with the other. Moreover, these systems may not accurately reproduce semantic and timely accurate aspects. All these elements can be exploited to perform a robust detection of fake content. In this paper, we propose a novel approach for detecting deepfake video sequences by leveraging data multimodality. Our method extracts audio-visual features from the input video over time and analyzes them using time-aware neural networks. We exploit both the video and audio modalities to leverage the inconsistencies between and within them, enhancing the final detection performance. The peculiarity of the proposed method is that we never train on multimodal deepfake data, but on disjoint monomodal datasets which contain visual-only or audio-only deepfakes. This frees us from leveraging multimodal datasets during training, which is desirable given their lack in the literature. Moreover, at test time, it allows to evaluate the robustness of our proposed detector on unseen multimodal deepfakes. We test different fusion techniques between data modalities and investigate which one leads to more robust predictions by the developed detectors. Our results indicate that a multimodal approach is more effective than a monomodal one, even if trained on disjoint monomodal datasets.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Deepfake detection of occluded images using a patch-based approach
    Mahsa Soleimani
    Ali Nazari
    Mohsen Ebrahimi Moghaddam
    [J]. Multimedia Systems, 2023, 29 : 2669 - 2687
  • [42] Robust Frame-Level Detection for Deepfake Videos With Lightweight Bayesian Inference Weighting
    Zhou, Linjiang
    Ma, Chao
    Wang, Zepeng
    Zhang, Yixuan
    Shi, Xiaochuan
    Wu, Libing
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (07) : 13018 - 13028
  • [43] An adversarial attack approach for eXplainable AI evaluation on deepfake detection models
    Gowrisankar, Balachandar
    Thing, Vrizlynn L. L.
    [J]. COMPUTERS & SECURITY, 2024, 139
  • [44] DeepFake-o-meter: An Open Platform for DeepFake Detection
    Li, Yuezun
    Zhang, Cong
    Sun, Pu
    Ke, Lipeng
    Ju, Yan
    Qi, Honggang
    Lyu, Siwei
    [J]. 2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2021), 2021, : 277 - 281
  • [45] A Survey of Deepfake Detection Methods
    Yildiz, Burak Ikan
    Gokberk, Berk
    [J]. 2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [46] Deepfake generation and detection, a survey
    Tao Zhang
    [J]. Multimedia Tools and Applications, 2022, 81 : 6259 - 6276
  • [47] Multimodal Information Fusion for Robust Heart Beat Detection
    Ding, Quan
    Bai, Yong
    Erol, Yusuf Bugra
    Salas-Boni, Rebeca
    Zhang, Xiaorong
    Li, Lei
    Hu, Xiao
    [J]. 2014 COMPUTING IN CARDIOLOGY CONFERENCE (CINC), VOL 41, 2014, 41 : 261 - 264
  • [48] LEARNING TO MULTIMODAL HASH FOR ROBUST VIDEO COPY DETECTION
    Peng, Haiyan
    Deng, Cheng
    An, Lingling
    Gao, Xinbo
    Tao, Dacheng
    [J]. 2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 4482 - 4486
  • [49] Multimodal Deep Learning for Robust Road Attribute Detection
    Yin, Yifang
    Hu, Wenmiao
    An Tran
    Zhang, Ying
    Wang, Guanfeng
    Kruppa, Hannes
    Zimmermann, Roger
    Ng, See-Kiong
    [J]. ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2023, 9 (04)
  • [50] Multimodal Approach for Inattentive Driver Detection
    Al-Naimi, Ibrahim
    Mami, Saed
    Sandoukah, Mohammad
    Haddad, Elias
    [J]. 2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 467 - 472