DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH

被引：22

作者：

Conti, Emanuele ^{[1
]}

Salvi, Davide ^{[1
]}

Borrelli, Clara ^{[1
]}

Hosler, Brian ^{[2
]}

Bestagini, Paolo ^{[1
]}

Antonacci, Fabio ^{[1
]}

Sarti, Augusto ^{[1
]}

Stamm, Matthew C. ^{[2
]}

Tubaro, Stefano ^{[1
]}

机构：

[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy

[2] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

deepfake; audio forensics; deep learning;

D O I：

10.1109/ICASSP43922.2022.9747186

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability. Several factors have facilitated the growing deepfake threat. On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material. On the other hand, neural network-based techniques have made deepfakes easier to produce and difficult to detect, showing that the analysis of low-level features is no longer sufficient for the task. This situation makes it crucial to design systems that allow detecting deepfakes at both video and audio levels. In this paper, we propose a new audio spoofing detection system leveraging emotional features. The rationale behind the proposed method is that audio deepfake techniques cannot correctly synthesize natural emotional behavior. Therefore, we feed our deepfake detector with high-level features obtained from a state-of-the-art Speech Emotion Recognition (SER) system. As the used descriptors capture semantic audio information, the proposed system proves robust in cross-dataset scenarios outperforming the considered baseline on multiple datasets.

引用

页码：8962 / 8966

页数：5

共 50 条

[1] Deepfake Speech Recognition and Detection
Chang, Hung-Chang
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (09)
[2] SPEECH EMOTION RECOGNITION USING SEMANTIC INFORMATION
Tzirakis, Panagiotis
Anh Nguyen
Zafeiriou, Stefanos
Schuller, Bjoern W.
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6279 - 6283
[3] Disruptive situation detection on public transport through speech emotion recognition
Mancini, Eleonora
Galassi, Andrea
Ruggeri, Federico
Torroni, Paolo
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 21
[4] Fuzzy speech emotion recognition considering semantic awareness
Xiong, Yu
Cai, Ting
Zhong, Xin
Zhou, Song
Cai, Linqin
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 7367 - 7377
[5] Speech Emotion Recognition Adapted to Multimodal Semantic Repositories
Vryzas, Nikolaos
Vrysis, Lazaros
Kotsakis, Rigas
Dimoulas, Charalampos
2018 13TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION (SMAP 2018), 2018, : 31 - 35
[6] Emotion Recognition Through Analysis of Speech - A Review
Poyraz, Rasim Atakan
Suvarna, Prajyot
Iliev, Alexander I.
DIGITAL PRESENTATION AND PRESERVATION OF CULTURAL AND SCIENTIFIC HERITAGE, 2024, 14 : 227 - 238
[7] Speech emotion recognition using a fuzzy approach
Ton-That, An H.
Cao, Nhan T.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (02) : 1587 - 1597
[8] Machine Learning Approach for Emotion Recognition in Speech
Gjoreski, Martin
Gjoreski, Hristijan
Kulakov, Andrea
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 377 - 383
[9] A Path Signature Approach for Speech Emotion Recognition
Wang, Bo
Liakata, Maria
Ni, Hao
Lyons, Terry
Nevado-Holgado, Alejo J.
Saunders, Kate
INTERSPEECH 2019, 2019, : 1661 - 1665
[10] Enhanced Speech Emotion Recognition Using the Cognitive Emotion Fusion Network for PTSD Detection with a Novel Hybrid Approach
Suneetha, Chappidi
Anitha, Raju
JOURNAL OF ELECTRICAL SYSTEMS, 2023, 19 (04) : 376 - 398

← 1 2 3 4 5 →