LISTENING BETWEEN THE LINES: SYNTHETIC SPEECH DETECTION DISREGARDING VERBAL CONTENT

被引:0
|
作者
Salvi, Davide [1 ]
Balcha, Temesgen Semu [1 ]
Bestagini, Paolo [1 ]
Tubaro, Stefano [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
Audio Forensics; Synthetic Speech; Background Noise; Explainability;
D O I
10.1109/ICASSPW62465.2024.10669901
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent advancements in synthetic speech generation have led to the creation of forged audio data that are almost indistinguishable from real speech. This phenomenon poses a new challenge for the multimedia forensics community, as the misuse of synthetic media can potentially cause adverse consequences. Several methods have been proposed in the literature to mitigate potential risks and detect synthetic speech, mainly focusing on the analysis of the speech itself. However, recent studies have revealed that the most crucial frequency bands for detection lie in the highest ranges (above 6000 Hz), which do not include any speech content. In this work, we extensively explore this aspect and investigate whether synthetic speech detection can be performed by focusing only on the background component of the signal while disregarding its verbal content. Our findings indicate that the speech component is not the predominant factor in performing synthetic speech detection. These insights provide valuable guidance for the development of new synthetic speech detectors and their interpretability, together with some considerations on the existing work in the audio forensics field.
引用
收藏
页码:883 / 887
页数:5
相关论文
共 50 条
  • [21] A Comparison of Features for Synthetic Speech Detection
    Sahidullah, Md
    Kinnunen, Tomi
    Hanilci, Cenral
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2087 - 2091
  • [22] Open Challenges in Synthetic Speech Detection
    Cuccovillo, Luca
    Papastergiopoulos, Christoforos
    Vafeiadis, Anastasios
    Yaroshchuk, Artem
    Aichroth, Patrick
    Votis, Konstantinos
    Tzovaras, Dimitrios
    2022 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2022,
  • [23] A SCHEME DISCRIMINATING BETWEEN SYNTHETIC SPEECH AND NORMAL SPEECH
    Chen, Jilun
    Zhang, Weiqiang
    Liu, Jia
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 683 - 688
  • [24] Relation Between Listening Effort and Speech Intelligibility in Noise
    Krueger, Melanie
    Schulte, Michael
    Zokoll, Melanie A.
    Wagener, Kirsten C.
    Meis, Markus
    Brand, Thomas
    Holube, Inga
    AMERICAN JOURNAL OF AUDIOLOGY, 2017, 26 (03) : 378 - 392
  • [25] LISTENING BETWEEN THE LINES - A CULTURAL APPROACH - LOUGHEED,L
    WHITTAKER, PF
    MODERN LANGUAGE JOURNAL, 1986, 70 (02): : 206 - 207
  • [26] Auditory verbal hallucinations in schizophrenia as aberrant lateralized speech perception: Evidence from dichotic listening
    Hugdahl, Kenneth
    Loberg, Else-Marie
    Falkenberg, Liv E.
    Johnsen, Erik
    Kompus, Kristiina
    Kroken, Rune A.
    Nygard, Merethe
    Westerhausen, Rene
    Alptekin, Koksal
    Ozgoren, Murat
    SCHIZOPHRENIA RESEARCH, 2012, 140 (1-3) : 59 - 64
  • [27] JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
    Xin, Detai
    Jiang, Junfeng
    Takamichi, Shinnosuke
    Saito, Yuki
    Aizawa, Akiko
    Saruwatari, Hiroshi
    arXiv, 2023,
  • [28] Listening between the Lines Introduction: Exploring the Interdisciplinarity between Music and Literature
    Griffiths, Christian
    Trevitt, Jessica
    AUSTRALIAN LITERARY STUDIES, 2014, 29 (1-2): : 1 - 11
  • [29] JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions
    Xin, Detai
    Jiang, Junfeng
    Takamichi, Shinnosuke
    Saito, Yuki
    Aizawa, Akiko
    Saruwatari, Hiroshi
    IEEE ACCESS, 2024, 12 : 19752 - 19764
  • [30] Modulation of neural responses to speech by directing attention to voices or verbal content
    von Kriegstein, K
    Eger, E
    Kleinschmidt, A
    Giraud, AL
    COGNITIVE BRAIN RESEARCH, 2003, 17 (01): : 48 - 55