DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH

被引：22

作者：

Conti, Emanuele ^{[1
]}

Salvi, Davide ^{[1
]}

Borrelli, Clara ^{[1
]}

Hosler, Brian ^{[2
]}

Bestagini, Paolo ^{[1
]}

Antonacci, Fabio ^{[1
]}

Sarti, Augusto ^{[1
]}

Stamm, Matthew C. ^{[2
]}

Tubaro, Stefano ^{[1
]}

机构：

[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

[2] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

deepfake; audio forensics; deep learning;

D O I：

10.1109/ICASSP43922.2022.9747186

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability. Several factors have facilitated the growing deepfake threat. On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material. On the other hand, neural network-based techniques have made deepfakes easier to produce and difficult to detect, showing that the analysis of low-level features is no longer sufficient for the task. This situation makes it crucial to design systems that allow detecting deepfakes at both video and audio levels. In this paper, we propose a new audio spoofing detection system leveraging emotional features. The rationale behind the proposed method is that audio deepfake techniques cannot correctly synthesize natural emotional behavior. Therefore, we feed our deepfake detector with high-level features obtained from a state-of-the-art Speech Emotion Recognition (SER) system. As the used descriptors capture semantic audio information, the proposed system proves robust in cross-dataset scenarios outperforming the considered baseline on multiple datasets.

引用

页码：8962 / 8966

页数：5

共 50 条

[41] Potential of Speech-Pathological Features for Deepfake Speech Detection
Chaiwongyen, Anuwat
Duangpummet, Suradej
Karnjana, Jessada
Kongprawechnon, Waree
Unoki, Masashi
IEEE ACCESS, 2024, 12 : 121958 - 121970
[42] Black-box adversarial attacks through speech distortion for speech emotion recognition
Jinxing Gao
Diqun Yan
Mingyu Dong
EURASIP Journal on Audio, Speech, and Music Processing, 2022
[43] Black-box adversarial attacks through speech distortion for speech emotion recognition
Gao, Jinxing
Yan, Diqun
Dong, Mingyu
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
[44] Speech emotion recognition based on emotion perception
Gang Liu
Shifang Cai
Ce Wang
EURASIP Journal on Audio, Speech, and Music Processing, 2023
[45] Speech emotion recognition based on emotion perception
Liu, Gang
Cai, Shifang
Wang, Ce
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[46] Autoencoder With Emotion Embedding for Speech Emotion Recognition
Zhang, Chenghao
Xue, Lei
IEEE ACCESS, 2021, 9 : 51231 - 51241
[47] Autoencoder with emotion embedding for speech emotion recognition
Zhang, Chenghao
Xue, Lei
IEEE Access, 2021, 9 : 51231 - 51241
[48] English speech emotion recognition method based on speech recognition
Liu, Man
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
[49] English speech emotion recognition method based on speech recognition
Man Liu
International Journal of Speech Technology, 2022, 25 : 391 - 398
[50] Emotion Recognition in Arabic Speech
Klaylat, Samira
Hamandi, Lama
Osman, Ziad
Zantout, Rached
2017 SENSORS NETWORKS SMART AND EMERGING TECHNOLOGIES (SENSET), 2017,

← 1 2 3 4 5 →