DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH

被引：22

作者：

Conti, Emanuele ^{[1
]}

Salvi, Davide ^{[1
]}

Borrelli, Clara ^{[1
]}

Hosler, Brian ^{[2
]}

Bestagini, Paolo ^{[1
]}

Antonacci, Fabio ^{[1
]}

Sarti, Augusto ^{[1
]}

Stamm, Matthew C. ^{[2
]}

Tubaro, Stefano ^{[1
]}

机构：

[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

[2] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

deepfake; audio forensics; deep learning;

D O I：

10.1109/ICASSP43922.2022.9747186

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability. Several factors have facilitated the growing deepfake threat. On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material. On the other hand, neural network-based techniques have made deepfakes easier to produce and difficult to detect, showing that the analysis of low-level features is no longer sufficient for the task. This situation makes it crucial to design systems that allow detecting deepfakes at both video and audio levels. In this paper, we propose a new audio spoofing detection system leveraging emotional features. The rationale behind the proposed method is that audio deepfake techniques cannot correctly synthesize natural emotional behavior. Therefore, we feed our deepfake detector with high-level features obtained from a state-of-the-art Speech Emotion Recognition (SER) system. As the used descriptors capture semantic audio information, the proposed system proves robust in cross-dataset scenarios outperforming the considered baseline on multiple datasets.

引用

页码：8962 / 8966

页数：5

共 50 条

[31] Distinctive Approach for Speech Emotion Recognition Using Machine Learning
Singh, Yogyata
Neetu
Rani, Shikha
MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT I, 2022, 1762 : 39 - 51
[32] A Hierarchical Approach with Feature Selection for Emotion Recognition from Speech
Giannoulis, Panagiotis
Potamianos, Gerasimos
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1203 - 1206
[33] Deep Learning Approach towards Emotion Recognition Based on Speech
Butala, Padmanabh
Pawar, Rajendra
Jadhav, Nagesh
Kalangan, Manas
Dhumal, Aniket
Kakad, Sahil
JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 16 - 24
[34] Bimodal Approach in Emotion Recognition using Speech and Facial Expressions
Emerich, Simina
Lupu, Eugen
Apatean, Anca
ISSCS 2009: INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS, VOLS 1 AND 2, PROCEEDINGS,, 2009, : 297 - 300
[35] VALENCE-AROUSAL APPROACH FOR SPEECH EMOTION RECOGNITION SYSTEM
Kamaruddin, Norhaslinda
Rahman, Abdul Wahab Abdul
2013 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTER AND COMPUTATION (ICECCO), 2013, : 184 - 187
[36] A Data Augmentation Approach for Improving the Performance of Speech Emotion Recognition
Paraskevopoulou, Georgia
Spyrou, Evaggelos
Perantonis, Stavros
SIGMAP: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2022, : 61 - 69
[37] Emotion Recognition from Speech - an LSTM approach with the Tess Dataset
Pandiammal, Sankara K.
Karishma, S.
Sakthe, Harine K.
Manimaran, V
Kalaiselvi, S.
Anitha, V
2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
[38] SpoofCeleb: Speech Deepfake Detection and SASV in the Wild
Jung, Jee-weon
Wu, Yihan
Wang, Xin
Kim, Ji-Hoon
Maiti, Soumi
Matsunaga, Yuta
Shim, Hye-jin
Tian, Jinchuan
Evans, Nicholas
Chung, Joon Son
Zhang, Wangyou
Um, Seyun
Takamichi, Shinnosuke
Watanabe, Shinji
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 68 - 77
[39] Multimodal Approach for DeepFake Detection
Lomnitz, Michael
Hampel-Arias, Zigfried
Sandesara, Vishal
Hu, Simon
2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
[40] Novel Multimodel Approach for Marathi Speech Emotion Detection
Yerigeri, Vaijanath V.
Ragha, L. K.
INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 195 - 207

← 1 2 3 4 5 →