LISTENING BETWEEN THE LINES: SYNTHETIC SPEECH DETECTION DISREGARDING VERBAL CONTENT

被引:0
|
作者
Salvi, Davide [1 ]
Balcha, Temesgen Semu [1 ]
Bestagini, Paolo [1 ]
Tubaro, Stefano [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
Audio Forensics; Synthetic Speech; Background Noise; Explainability;
D O I
10.1109/ICASSPW62465.2024.10669901
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent advancements in synthetic speech generation have led to the creation of forged audio data that are almost indistinguishable from real speech. This phenomenon poses a new challenge for the multimedia forensics community, as the misuse of synthetic media can potentially cause adverse consequences. Several methods have been proposed in the literature to mitigate potential risks and detect synthetic speech, mainly focusing on the analysis of the speech itself. However, recent studies have revealed that the most crucial frequency bands for detection lie in the highest ranges (above 6000 Hz), which do not include any speech content. In this work, we extensively explore this aspect and investigate whether synthetic speech detection can be performed by focusing only on the background component of the signal while disregarding its verbal content. Our findings indicate that the speech component is not the predominant factor in performing synthetic speech detection. These insights provide valuable guidance for the development of new synthetic speech detectors and their interpretability, together with some considerations on the existing work in the audio forensics field.
引用
收藏
页码:883 / 887
页数:5
相关论文
共 50 条
  • [31] Deep correlation network for synthetic speech detection
    Chen, Chen
    Dai, Bohan
    Bai, Bochao
    Chen, Deyun
    APPLIED SOFT COMPUTING, 2024, 154
  • [32] CBC-Based Synthetic Speech Detection
    Yang, Jichen
    He, Qianhua
    Hu, Yongjian
    Pan, Weiqiang
    INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2019, 11 (02) : 63 - 74
  • [33] Relationship between Chinese speech intelligibility and speech transmission index using diotic listening
    Peng Jianxin
    SPEECH COMMUNICATION, 2007, 49 (12) : 933 - 936
  • [34] Twice attention networks for synthetic speech detection
    Chen, Chen
    Song, Yaozu
    Dai, Bohan
    Chen, Deyun
    NEUROCOMPUTING, 2023, 559
  • [35] Synthetic speech detection using phase information
    Saratxaga, Ibon
    Sanchez, Jon
    Wu, Zhizheng
    Hernaez, Inma
    Navas, Eva
    SPEECH COMMUNICATION, 2016, 81 : 30 - 41
  • [36] Synthetic Speech Detection Using Neural Networks
    Reimao, Ricardo
    Tzerpos, Vassilios
    2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 97 - 102
  • [37] Synthetic Speech Detection through Audio Folding
    Salvi, Davide
    Bestagini, Paolo
    Tubaro, Stefano
    PROCEEDINGS OF THE 2ND ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISCRIMINATION, MAD 2023, 2023, : 3 - 9
  • [38] Significance of Subband Features for Synthetic Speech Detection
    Yang, Jichen
    Das Rohan, Kumar
    Li, Haizhou
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2020, 15 : 2160 - 2170
  • [39] The role of phonological awareness in mediating between reading and listening to speech
    Cheung, Him
    LANGUAGE AND COGNITIVE PROCESSES, 2007, 22 (01): : 130 - 154
  • [40] Speech Formants Integration for Generalized Detection of Synthetic Speech Spoofing Attacks
    Liu, Kexu
    Wang, Yuanxin
    Lie, Shengchen
    Shao, Xi
    INTERSPEECH 2024, 2024, : 2100 - 2104