SSLCT: A Convolutional Transformer for Synthetic Speech Localization

被引:0
|
作者
Bhagtani, Kratika [1 ]
Yadav, Amit Kumar Singh [1 ]
Bestagini, Paolo [2 ]
Delp, Edward J. [1 ]
机构
[1] Purdue Univ, Video & Image Proc Lab VIPER, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[2] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Milan, Italy
关键词
Synthetic speech localization; speech forensics; deepfake speech; PartialSpoof; transformer;
D O I
10.1109/MIPR62202.2024.00028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.
引用
收藏
页码:134 / 140
页数:7
相关论文
共 50 条
  • [1] GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition
    Gao, Yingxue
    Zhao, Huan
    Xiao, Yufeng
    Zhang, Zixing
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 307 - 313
  • [2] A Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognition
    Iyer, Shreyah
    Glackin, Cornelius
    Cannings, Nigel
    Veneziano, Vito
    Sun, Yi
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [3] Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis
    Cuccovillo, Luca
    Gerhardt, Milica
    Aichroth, Patrick
    2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS, 2023,
  • [4] Congested crowd instance localization with dilated convolutional swin transformer
    Gao, Junyu
    Gong, Maoguo
    Li, Xuelong
    NEUROCOMPUTING, 2022, 513 : 94 - 103
  • [5] BINAURAL SPEECH ENHANCEMENT USING DEEP COMPLEX CONVOLUTIONAL TRANSFORMER NETWORKS
    Tokala, Vikas
    Grinstein, Eric
    Brookes, Mike
    Doclo, Simon
    Jensen, Jesper
    Naylor, Patrick A.
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 681 - 685
  • [6] Depression Detection in Speech Using Transformer and Parallel Convolutional Neural Networks
    Yin, Faming
    Du, Jing
    Xu, Xinzhou
    Zhao, Li
    ELECTRONICS, 2023, 12 (02)
  • [7] Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement
    Jannu, Chaitanya
    Vanambathina, Sunny Dayal
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 731 - 743
  • [8] Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
    Agarwal, Shrutina
    Ganapathy, Sriram
    Takahashi, Naoya
    INTERSPEECH 2022, 2022, : 3013 - 3017
  • [9] AUDIO TRANSFORMER FOR SYNTHETIC SPEECH DETECTION VIA FORMANT MAGNITUDE AND PHASE ANALYSIS
    Cuccovillo, Luca
    Gerhardt, Milica
    Aichroth, Patrick
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4805 - 4809
  • [10] SpoTNet: A spoofing-aware Transformer Network for Effective Synthetic Speech Detection
    Khan, Awais
    Malik, Khalid Mahmood
    PROCEEDINGS OF THE 2ND ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISCRIMINATION, MAD 2023, 2023, : 10 - 18