Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches

被引:77
|
作者
Agarwal, Shruti [1 ]
Farid, Hany [1 ]
Fried, Ohad [2 ]
Agrawala, Maneesh [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Stanford Univ, Stanford, CA 94305 USA
关键词
D O I
10.1109/CVPRW50498.2020.00338
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in machine learning and computer graphics have made it easier to convincingly manipulate video and audio. These so-called deep-fake videos range from complete full-face synthesis and replacement (face-swap), to complete mouth and audio synthesis and replacement (lip-sync), and partial word-based audio and mouth synthesis and replacement. Detection of deep fakes with only a small spatial and temporal manipulation is particularly challenging. We describe a technique to detect such manipulated videos by exploiting the fact that the dynamics of the mouth shape - visemes - are occasionally inconsistent with a spoken phoneme. We focus on the visemes associated with words having the sound M (mama), B (baba), or P (papa) in which the mouth must completely close in order to pronounce these phonemes. We observe that this is not the case in many deep-fake videos. Such phonemeviseme mismatches can, therefore, be used to detect even spatially small and temporally localized manipulations. We demonstrate the efficacy and robustness of this approach to detect different types of deep-fake videos, including in-the-wild deep fakes.
引用
收藏
页码:2814 / 2822
页数:9
相关论文
共 13 条
  • [1] Detecting Deep-Fake Videos from Appearance and Behavior
    Agarwal, Shruti
    Farid, Hany
    El-Gaaly, Tarek
    Lim, Ser-Nam
    2020 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2020,
  • [2] Detecting Deep-Fake Videos from Aural and Oral Dynamics
    Agarwal, Shruti
    Farid, Hany
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 981 - 989
  • [3] Detecting deep-fake videos from aural and oral dynamics
    Agarwal, Shruti
    Farid, Hany
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2021, : 981 - 989
  • [4] Hindi phoneme-viseme recognition from continuous speech
    Mishra, A. N.
    Chandra, Mahesh
    Biswas, Astik
    Sharan, S. N.
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2013, 6 (03) : 164 - 171
  • [5] Detecting Real-Time Deep-Fake Videos Using Active Illumination
    Gerstner, Candice R.
    Farid, Hany
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 53 - 60
  • [6] A novel approach for detecting deep fake videos using graph neural network
    El-Gayar, M. M.
    Abouhawwash, Mohamed
    Askar, S. S.
    Sweidan, Sara
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [7] A novel approach for detecting deep fake videos using graph neural network
    M. M. El-Gayar
    Mohamed Abouhawwash
    S. S. Askar
    Sara Sweidan
    Journal of Big Data, 11
  • [8] Detecting stabbing by a deep learning method from surveillance videos
    Liu, Chunguang
    Liu, Peng
    Xiao, Chuanxin
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS & TECHNIQUES (IST 2019), 2019,
  • [9] Detecting Threats from Live Videos using Deep Learning Algorithms
    Alshehri, Rawan Aamir Mushabab
    Saudagar, Abdul Khader Jilani
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 643 - 658
  • [10] Detecting dry eye from ocular surface videos based on deep learning
    Abdelmotaal, Hazem
    Hazarbasanov, Rossen
    Taneri, Suphi
    Al-Timemy, Ali
    Lavric, Alexandru
    Takahashi, Hidenori
    Yousefi, Siamak
    OCULAR SURFACE, 2023, 28 : 90 - 98