Dual-Modality Co-Learning for Unveiling Deepfake in Spatio-Temporal Space

被引:2
|
作者
Guan, Jiazhi [1 ]
Zhou, Hang [2 ]
Guo, Zhizhi [3 ]
Hu, Tianshu [2 ]
Deng, Lirui [1 ]
Quan, Chengbin [1 ]
Fang, Meng [4 ]
Zhao, Youjian [1 ]
机构
[1] Tsinghua Univ, BNRist, DCST, Beijing, Peoples R China
[2] VIS Baidu Inc, Beijing, Peoples R China
[3] China Telecom, Beijing, Peoples R China
[4] Univ Liverpool, Liverpool, Merseyside, England
关键词
Deepfake Detection; Digital Forensics; Spatio-Temporal Analysis;
D O I
10.1145/3591106.3592284
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of photo-realistic deepfakes on a large scale has become a significant societal concern, which has garnered considerable attention from the research community. Several recent studies have identified the critical issue of "temporal inconsistency" resulting from the frame reassembling process of deepfake generation techniques. However, due to the lack of task-specific design, the spatio-temporal modeling of current methods remains insufficient in three critical aspects: 1) inapparent temporal changes are prone to be undermined compared to abundant spatial cues; 2) minor inconsistent regions are often concealed by motions with greater amplitude during downsampling; 3) capturing both transient inconsistencies and persistent motions simultaneously remains a significant challenge. In this paper, we propose a novel Dual-Modality Co-Learning framework tailored for these characteristics, which achieves more effectual deepfake detection with complementary information from RGB and optical flow modalities. In particular, we designed a Multi-Scale Motion Regularization module to encourage the network to equally prioritize both the significant spatial cues and the subtle temporal facial motion cues. Additionally, we developed a Multi-Span Cross-Attention module to effectively integrate the information from both RGB and optical flow modalities and improve the detection accuracy with multi-span predictions. Extensive experiments validate the effectiveness our ideas and demonstrate the superior performance of our approach.
引用
收藏
页码:85 / 94
页数:10
相关论文
共 50 条
  • [1] Dynamic Difference Learning With Spatio-Temporal Correlation for Deepfake Video Detection
    Yin, Qilin
    Lu, Wei
    Li, Bin
    Huang, Jiwu
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 4046 - 4058
  • [2] Dual Contrastive Learning for Spatio-temporal Representation
    Ding, Shuangrui
    Qian, Rui
    Xiong, Hongkai
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5649 - 5658
  • [3] Learning Spatio-temporal features to detect manipulated facial videos created by the Deepfake techniques
    Nguyen X.H.
    Tran T.S.
    Le V.T.
    Nguyen K.D.
    Truong D.-T.
    Forensic Science International: Digital Investigation, 2021, 36
  • [4] Attention Guided Spatio-Temporal Artifacts Extraction for Deepfake Detection
    Wang, Zhibing
    Li, Xin
    Ni, Rongrong
    Zhao, Yao
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 374 - 386
  • [5] A Multi-color Spatio-Temporal Approach For Detecting DeepFake
    Waseem, Saima
    Abu-Bakar, Syed R.
    Omar, Zaid
    Ahmed, Bilal Ashfaq
    Baloch, Saba
    2022 12TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS (ICPRS), 2022,
  • [6] Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework
    Guo, Wenxuan
    Du, Shuo
    Deng, Huiyuan
    Yu, Zikang
    Feng, Lin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] Learning a spatio-temporal correlation
    Narain, D.
    Mamassian, P.
    van Beers, R. J.
    Smeets, J. B. J.
    Brenner, E.
    PERCEPTION, 2012, 41 : 58 - 58
  • [8] Spatio-Temporal Split Learning
    Kim, Joongheon
    Park, Seunghoon
    Jung, Soyi
    Yoo, Seehwan
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS - SUPPLEMENTAL VOL (DSN 2021), 2021, : 11 - 12
  • [9] MOTION DETECTION IN SPATIO-TEMPORAL SPACE
    LIOU, SP
    JAIN, RC
    COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1989, 45 (02): : 227 - 250
  • [10] Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning
    Deng, Jiewen
    Jiang, Renhe
    Zhang, Jiaqi
    Song, Xuan
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 2018 - 2026