Human detection of political speech deepfakes across transcripts, audio, and video

被引：4

作者：

Groh, Matthew ^{[1
]}

Sankaranarayanan, Aruna ^{[2
,3
]}

Singh, Nikhil ^{[2
]}

Kim, Dong Young ^{[2
]}

Lippman, Andrew ^{[2
]}

Picard, Rosalind ^{[2
]}

机构：

[1] Northwestern Univ, Kellogg Sch Management, Evanston, IL 60208 USA

[2] MIT, Media Lab, Cambridge, MA USA

[3] MIT, CSAIL, Cambridge, MA USA

来源：

NATURE COMMUNICATIONS | 2024年 / 15卷 / 01期

关键词：

SOCIAL MEDIA; NEWS; MISINFORMATION; DISINFORMATION; ATTENTION; KNOWLEDGE; SCIENCE; PHOTOS; IMPACT;

D O I：

10.1038/s41467-024-51998-z

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video. We conduct 5 pre-registered randomized experiments with N = 2215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings with and without priming, and media modalities. We do not find base rates of misinformation have statistically significant effects on discernment. We find deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments and question framings, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content. With advances in generative AI, political speech deepfakes are becoming more realistic. Here, the authors show that people's ability to distinguish between real and fake speeches relies on audio and visual information more than the speech content.

引用

页数：16

共 50 条

[31] Synthetic Speech Detection through Audio Folding
Salvi, Davide
Bestagini, Paolo
Tubaro, Stefano
PROCEEDINGS OF THE 2ND ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISCRIMINATION, MAD 2023, 2023, : 3 - 9
[32] An automatic multimodal speech recognition system with audio and video information
Karpov, A. A.
AUTOMATION AND REMOTE CONTROL, 2014, 75 (12) : 2190 - 2200
[33] An automatic multimodal speech recognition system with audio and video information
A. A. Karpov
Automation and Remote Control, 2014, 75 : 2190 - 2200
[34] DeepFakes detection across generations: Analysis of facial regions, fusion, and performance evaluation
Tolosana, Ruben
Romero-Tapiador, Sergio
Vera-Rodriguez, Ruben
Gonzalez-Sosa, Ester
Fierrez, Julian
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 110
[35] Penalty Detection in Football Video on Audio and Shot
Nie, Yanliu
Fan, Jiande
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT AND COMPUTER SCIENCE (ICEMC 2016), 2016, 129 : 963 - 969
[36] Combining Audio and Video for Detection of Spontaneous Emotions
Gajsek, Rok
Struc, Vitomir
Dobrisek, Simon
Zibert, Janez
Mihelic, France
Pavesic, Nikola
BIOMETRIC ID MANAGEMENT AND MULTIMODAL COMMUNICATION, PROCEEDINGS, 2009, 5707 : 114 - 121
[37] Scene change detection by audio and video clues
Chen, SC
Shyu, ML
Liao, W
Zhang, CC
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A365 - A368
[38] Audio-Visual Overlapped Speech Detection for Spontaneous Distant Speech
Kyoung, Minyoung
Jeon, Hyungbae
Park, Kiyoung
IEEE ACCESS, 2023, 11 : 27426 - 27432
[39] Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions
Stewart, Darryl
Seymour, Rowan
Pass, Adrian
Ming, Ji
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (02) : 175 - 184
[40] Diverse misinformation: impacts of human biases on detection of deepfakes on networks
Juniper Lovato
Jonathan St-Onge
Randall Harp
Gabriela Salazar Lopez
Sean P. Rogers
Ijaz Ul Haq
Laurent Hébert-Dufresne
Jeremiah Onaolapo
npj Complexity, 1 (1):

← 1 2 3 4 5 →