DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH

被引:22
|
作者
Conti, Emanuele [1 ]
Salvi, Davide [1 ]
Borrelli, Clara [1 ]
Hosler, Brian [2 ]
Bestagini, Paolo [1 ]
Antonacci, Fabio [1 ]
Sarti, Augusto [1 ]
Stamm, Matthew C. [2 ]
Tubaro, Stefano [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
[2] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
deepfake; audio forensics; deep learning;
D O I
10.1109/ICASSP43922.2022.9747186
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability. Several factors have facilitated the growing deepfake threat. On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material. On the other hand, neural network-based techniques have made deepfakes easier to produce and difficult to detect, showing that the analysis of low-level features is no longer sufficient for the task. This situation makes it crucial to design systems that allow detecting deepfakes at both video and audio levels. In this paper, we propose a new audio spoofing detection system leveraging emotional features. The rationale behind the proposed method is that audio deepfake techniques cannot correctly synthesize natural emotional behavior. Therefore, we feed our deepfake detector with high-level features obtained from a state-of-the-art Speech Emotion Recognition (SER) system. As the used descriptors capture semantic audio information, the proposed system proves robust in cross-dataset scenarios outperforming the considered baseline on multiple datasets.
引用
收藏
页码:8962 / 8966
页数:5
相关论文
共 50 条
  • [41] Potential of Speech-Pathological Features for Deepfake Speech Detection
    Chaiwongyen, Anuwat
    Duangpummet, Suradej
    Karnjana, Jessada
    Kongprawechnon, Waree
    Unoki, Masashi
    IEEE ACCESS, 2024, 12 : 121958 - 121970
  • [42] Black-box adversarial attacks through speech distortion for speech emotion recognition
    Jinxing Gao
    Diqun Yan
    Mingyu Dong
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [43] Black-box adversarial attacks through speech distortion for speech emotion recognition
    Gao, Jinxing
    Yan, Diqun
    Dong, Mingyu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [44] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [45] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [46] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE ACCESS, 2021, 9 : 51231 - 51241
  • [47] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE Access, 2021, 9 : 51231 - 51241
  • [48] English speech emotion recognition method based on speech recognition
    Liu, Man
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
  • [49] English speech emotion recognition method based on speech recognition
    Man Liu
    International Journal of Speech Technology, 2022, 25 : 391 - 398
  • [50] Emotion Recognition in Arabic Speech
    Klaylat, Samira
    Hamandi, Lama
    Osman, Ziad
    Zantout, Rached
    2017 SENSORS NETWORKS SMART AND EMERGING TECHNOLOGIES (SENSET), 2017,